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GROUP B STREPTOCOCCUS 

' This application incorporates by reference the contents of each of two duplicate CD-ROMs 
which contain an identical 90.1 MB file labeled "PP28007 PCT sequence listing.txt," which is the 
sequence listing for this application. The CD-ROMs were created on December 21, 2005. This 
5 application also incorporates by reference the contents of each of two duplicate CD-ROMs which 
contain an identical 681 KB file labeled "Table Ltxt" and containing Table I. The CD-ROMs were 
created on December 20, 2005. 

All documents cited herein are incorporated by reference in their entirety. 

TECHNICAL FIELD 

10 This invention is in the field of Streptococcus biology, and in particular relates to S.agalactiqe, also 
known as 'group B streptococcus' (GBS). 

BACKGROUND ART 

Once thought to infect only cows, the Gram-positive bacterium Streptococcus agalactiae (or "group 
B streptococcus", abbreviated to "GBS") is now known to cause serious disease, bacteremia and 

15 meningitis, in immunocompromised individuals and in neonates. There are two types of neonatal 
infection. The first (early onset, usually within 5 days of birth) is manifested by bacteremia and 
pneumonia. It is contracted vertically as a baby passes through the birth canal. GBS colonises the 
vagina of about 25% of young women, and approximately 1% of infants born via a vaginal birth to 
colonised mothers will become infected. Mortality is between 50-70%. The second is a meningitis 

20 that occurs 10 to 60 days after birth. If pregnant women are vaccinated with type III capsule so that 
the infants are passively immunised, the incidence of the late onset meningitis is reduced but is not 
entirely eliminated. 

The "B" in "GBS" refers to the Lancefield classification, which is based on the antigenicity of a 
carbohydrate which is soluble in dilute acid and called the C carbohydrate. Lancefield identified 13 

25 types of C carbohydrate, designated A to O, that could be serologically differentiated. The organisms 
that most commonly infect humans are found in groups A, B, D, and G. Within group B, strains can 
be divided into 8 serotypes (la, lb, Ia/c, II, III, IV, V, and VI) based on the structure of -their 
^polysaccharide capsule. The genome sequence of a serotype Vstrmn ofGBS has been published and 
analysed [1,2], including a comparative genome hybridization analysis of 19 disease-causing isolates 

30 of the same type V strain 2603 V/R. The genome sequence of a serotype III strain is also known [3]. 

Current GBS vaccines are based on polysaccharide antigens, although these suffer from poor 
immunogenicity. Anti-idiotypic approaches have also been used {e.g. ref. 4). There remains a need, 
however, for effective adult vaccines against S. agalactiae infection. 

It is an object of the invention to provide proteins which can be used in the development of such 
35 vaccines. The proteins may also be useful for diagnostic purposes, and as targets for antibiotics. 
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Dl'S<j£dSUIffi'"6F THE INVENTION 
Polypeptides 

The invention provides polypeptides comprising the GBS amino acid sequences disclosed in the 
examples. These amino acid sequences are the even SEQ ID NOs between 2 and 22740. There are 
5 thus 11370 amino acid sequences. The polypeptides encoded by sequences listed in Table IV have 
not previously been seen in GBS strains. 

The invention also provides polypeptides comprising amino acid sequences that have sequence 
identity to the GBS amino acid sequences disclosed in the examples. Depending on the particular 
sequence, the degree of sequence identity is preferably greater than 50% {e.g. 60%, 70%, 75%, 80%, 

10 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more). These polypeptides include 
homologs, orthologs, allelic variants and functional mutants. Typically, 50% identity or more 
between two polypeptide sequences is considered to be an indication of functional equivalence. 
Identity between polypeptides is preferably determined by the Smith- Waterman homology search 
algorithm as implemented in the MPSRCH program (Oxford Molecular), using an affine gap search 

1 5 with parameters gap open penalty =12 and gap extension penalty=l . 

These polypeptide may, compared to the GBS sequences of the examples, include one or more (e.g. 
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, etc.) conservative amino acid replacements i.e. replacements of one amino 
acid with another which has a related side chain. Genetically-encoded amino acids are generally 
divided into four families: (1) acidic i.e. aspartate, glutamate; (2) basic i.e. lysine, arginine, histidine; 

20 (3) non-polar i.e. alanine, valine, leucine, isoleucine, proline, phenylalanine, methionine, tryptophan; 
and (4) uncharged polar i.e. glycine, asparagine, glutamine, cysteine, serine, threonine, tyrosine. 
Phenylalanine, tryptophan, and tyrosine are sometimes classified jointly as aromatic amino acids. In 
general, substitution of single amino acids within these families does not have a major effect on the 
biological activity. The polypeptides may have one or more {e.g. 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, etc.) 

25 single amino acid deletions relative to the GBS sequences of the examples. The polypeptides may 
also include one or more {e.g. 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, etc.) insertions {e.g. each of 1, 2, 3, 4 or 5 
amino acids) relative to the GBS sequences of the examples. Some of these deletions, insertions or 
substitutions may convert one sequence of the invention to another sequence of the invention e.g. 

_ . _ . . _aminp acids J 8jO-23„Q„of SEQ ID .N_Q; 8_614.Cidentiaal to amino„ acids. 173-223 of SEQ ID NO; 14060. 

30 and amino acids 4-54 of SEQ ID NO: 3916) become amino acids 180-230 of SEQ ID NO: 12908 by 
conservative substitution of He- 185 for Val. 

Preferred polypeptides of the invention are listed below, including polypeptides that are lipidated, 
that are located in the outer membrane, that are located in the inner membrane, or that are located in 
the periplasm. Particularly preferred polypeptides are those that fall into more than one of these 
35 categories e.g. lipidated polypeptides that are located in the outer membrane. Lipoproteins may have 
a N-terminal cysteine to which lipid is covalently attached, following post-translational processing of 
the signal peptide. 
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Th& invention fuffffief pfbvideTpoljpeptides comprising fragments of the GBS amino acid sequences 
disclosed in the examples. The fragments should comprise at least n consecutive amino acids from 
the sequences and, depending on the particular sequence, n is 7 or more {e.g. 8, 10, 12, 14, 16, 18, 
20, 22, 24, 26, 28, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100 or more). 

The fragment may comprise at least one T-cell or, preferably, a B-cell epitope of the sequence. T- 
and B-cell epitopes can be identified empirically {e.g. using PEPSCAN [5,6] or similar methods), or 
they can be predicted {e.g. using the Jameson- Wolf antigenic index [7], matrix-based approaches [8], 
TEPITOPE [9], neural networks [10], OptiMer & EpiMer [11,12], ADEPT [13], Tsites [14], 
hydrophilicity [15], antigenic index [16] or the methods disclosed in reference 17, etc.). Other 
preferred fragments are (a) the N-terminal signal peptides of the GBS polypeptides of the invention, 
(b) the GBS polypeptides, but without their N-terminal signal peptides, (c) the GBS polypeptides, but 
without their N-terminal amino acid residue. 

Further preferred fragments are those common to at least two {e.g. 2, 3, 4 or 5) homologous coding 
sequences, and in particular those common to homologous coding sequences within the sequence 
listing. Table II shows homologous SEQ ID numbers for nucleic acids within the sequence listing 
e.g. SEQ ID NOs: 88, 4374, 8834, 13214 and 17994 are homologous within the sequence listing, and 
are also homologous with prior art GI sequences 22533036 and 23094457. Simple alignments show 
that amino acids 1-131 of these five SEQ ID NOs are common, as are amino acids 133-176, 178-182, 
184-190, 192-217, 219-250, 252-278, 280-322, 324-366, 368-373 and 375-434. Similarly, 1-176 are 
common to SEQ ID NOs: 88, 4374, 8834 and 13214, but not to 17994. Thus fragments 1-131, 1-176 
and 133-176 are all preferred fragments of the invention. In some cases, where homologous 
sequences are 100% identical between strains along their complete lengths {e.g. SEQ ID NOs: 2, 
8616, 12910, 14062 and 22384), the common 'fragment' will in fact be the complete sequence. 

Other preferred fragments are those that begin with an amino acid encoded by a potential start codon 
(ATG, GTG, TTG). Fragments starting at the methionine encoded by a start codon downstream of 
the indicated start codon are polypeptides of the invention. 

Polypeptides of the invention can be prepared in many ways e.g. by chemical synthesis (in whole or 
in part), by digesting longer polypeptides using proteases, by translation from RNA, by purification 
~ rfromxell cxilture-('e.g. -from recombinant expression), from the organism -itself (e.g. after bacterial 
culture, or direct from patients), etc. A preferred method for production of peptides <40 amino acids 
long involves in vitro chemical synthesis [18,19]. Solid-phase peptide synthesis is particularly 
preferred, such as methods based on tBoc or Fmoc [20] chemistry. Enzymatic synthesis [21] may 
also be used in part or in full. As an alternative to chemical synthesis, biological synthesis may be 
used e.g. the polypeptides may be produced by translation. This may be carried out in vitro or in vivo. 
Biological methods are in general restricted to the production of polypeptides based on L-amino 
acids, but manipulation of translation machinery {e.g. of aminoacyl tRNA molecules) can be used to 
allow the introduction of D-amino acids (or of other non natural amino acids, such as iodotyrosine or 
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methyiphenylalanine, azidohomoalanine, etc.) [22]. Where D-amino acids are included, however, it 
is preferred to use chemical synthesis. Polypeptides of the invention may have covalent 
modifications at the C-terminus and/or N-terminus. 

Polypeptides of the invention can take various forms (e.g. native, fusions, glycosylated, 
5 non-glycosylated, lipidated, non-lipidated, phosphorylated, non-phosphorylated, myristoylated, 
non-myristoylated, monomelic, multimeric, particulate, denatured, etc.). 

Polypeptides of the invention are preferably provided in purified or substantially purified form i.e. 
substantially free from other polypeptides (e.g. free from naturally-occurring polypeptides), 
particularly from other streptococcal or host cell polypeptides, and are generally at least about 50% 
10 pure (by weight), and usually at least about 90% pure i.e. less than about 50%, and more preferably 
less than about 10% (e.g. 5%) of a composition is made up of other expressed polypeptides. 
Polypeptides of the invention are preferably GBS polypeptides. Polypeptides of the invention 
preferably have the function indicated in Table I for the relevant sequence. 

Polypeptides of the invention may be attached to a solid support. Polypeptides of the invention may 
15 comprise a detectable label (e.g. a radioactive or fluorescent label, or a biotin label). 

The term "polypeptide" refers to amino acid polymers of any length. The polymer may be linear or 
branched, it may comprise modified amino acids, and it may be interrupted by non-amino acids. The 
terms also encompass an amino acid polymer that has been modified naturally or by intervention; for 
example, disulfide bond formation, glycosylation, lipidation, acetylation, phosphorylation, or any 

20 other manipulation or modification, such as conjugation with a labeling component. Also included 
within the definition are, for example, polypeptides containing one or more analogs of an amino acid 
(including, for example, unnatural amino acids, etc.), as well as other modifications known in the art. 
Polypeptides can occur as single chains or associated chains. Polypeptides of the invention can be 
naturally or non-naturally glycosylated (i.e. the polypeptide has a glycosylation pattern that differs 

25 from the glycosylation pattern found in the corresponding naturally occurring polypeptide). 

Polypeptides of the invention may be at least 40 amino acids long (e.g. at least 40, 50, 60, 70, 80, 90, 
100, 120, 140, 160, 180, 200, 220, 240, 260, 280, 300, 350, 400, 450, 500 or more). Polypeptides of 
the invention may be shorter than 500 amino acids (e.g. no longer than 40, 50, 60, 70, 80, 90, 100, 
120, 140, 160, 180, 200, 220, 240, 260, 280," 300, 350, 400 or 450 amino acids). 

30 The invention provides polypeptides comprising a sequence -X-Y- or -Y-X-, wherein: -X- is an 
amino acid sequence as defined above and -Y- is not a sequence as defined above i.e. the invention 
provides fusion proteins. Where the N-terminus codon of a polypeptide-coding sequence is not ATG 
then that codon will be translated as the standard amino acid for that codon rather than as a Met, 
which occurs when the codon is translated as a start codon. 

35 The invention provides a process for producing polypeptides of the invention, comprising the step of 
culturing a host cell of to the invention under conditions which induce polypeptide expression. 
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ine mvenuon provides a process tor producing a polypeptide of the invention, wherein the 
polypeptide is synthesised in part or in whole using chemical means. 

The invention provides a composition comprising two or more polypeptides of the invention. 

The invention also provides a hybrid polypeptide represented by the formula NH 2 -A-[-X-L-], r B- 
5 COOH, wherein X is a polypeptide of the invention as defined above, L is an optional linker amino 
acid sequence, A is an optional N-terminal amino acid sequence, B is an optional C-terminal amino 
acid sequence, and n is an integer greater than 1. The value of n is between 2 and x, and the value of 
x is typically 3, 4, 5, 6, 7, 8, 9 or 10. Preferably n is 2, 3 or 4; it is more preferably 2 or 3; most 
preferably, n = 2. For each n instances, -X- may be the same or different. For each n instances of 

10 [-X-L-], linker amino acid sequence -L- may be present or absent. For instance, when n=2 the hybrid 
may be NH 2 -X 1 -L 1 -X 2 -L 2 -COOH, NH 2 -Xi-X 2 -COOH, NH 2 -Xi-L r X 2 -COOH, NH 2 -X r X 2 -L 2 - 
COOH, etc. Linker amino acid sequence(s) -L- will typically be short (e.g. 20 or fewer amino acids 
i.e. 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1). Examples include short peptide 
sequences which facilitate cloning, poly-glycine linkers (i.e. Gly n where n = 2, 3, 4, 5, 6, 7, 8, 9, 10 

15 or more), and histidine tags (i.e. His w where n = 3, 4, 5, 6, 7, 8, 9, 10 or more). Other suitable linker 
amino acid sequences will be apparent to those skilled in the art. -A- and -B- are optional sequences 
which will typically be short (e.g. 40 or fewer amino acids i.e. 39, 38, 37, 36, 35, 34, 33, 32, 31, 30, 
29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1). 
Examples include leader sequences to direct polypeptide trafficking, or short peptide sequences 

20 which facilitate cloning or purification (e.g. histidine tags i.e. His„ where n = 3, 4, 5, 6, 7, 8, 9, 10 or 
more). Other suitable N-terminal and C-terminal amino acid sequences will be apparent to those 
skilled in the art. 

Various tests can be used to assess the in vivo immunogenicity of polypeptides of the invention. For 
example, polypeptides can be expressed recombinantly and used to screen patient sera by 
25 immunoblot A positive reaction between the polypeptide and patient serum indicates that the patient 
has previously mounted an immune response to the protein in question i.e. the protein is an 
immunogen. This method can also be used to identify immunodominant proteins. 

Antibodies 

The invention provides antibodies that bind to polypeptides of the invention These may be " 
30 polyclonal or monoclonal and may be produced by any suitable means (e.g. by recombinant 
expression). To increase compatibility with the human immune system, the antibodies may be 
chimeric or humanised [e.g. refs. 23 & 24], or fully human antibodies may be used. The antibodies 
may include a detectable label (e.g. for diagnostic assays). Antibodies of the invention may be 
attached to a solid support. Antibodies of the invention are preferably neutralising antibodies. 

35 Monoclonal antibodies are particularly useful in identification and purification of the individual 
polypeptides against which they are directed. Monoclonal antibodies of the invention may also be 
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employee as reagents in immunoassays, radioimmunoassays (RIA) or enzyme-linked immunosorbent 
assays (ELISA), etc.. In these applications, the antibodies can be labelled with an analytically- 
detectable reagent such as a radioisotope, a fluorescent molecule or an enzyme. The monoclonal 
antibodies produced by the above method may also be used for the molecular identification and 
characterization (epitope mapping) of polypeptides of the invention. 

Antibodies of the invention are preferably specific to Streptococci i.e. they bind preferentially to 
Streptococci bacteria relative to non-Streptococdi bacteria. More preferably, the antibodies are 
specific to GBS i.e. they bind preferentially to GBS bacteria relative to non-type-b streptococci. 

Antibodies of the invention are preferably provided in purified or substantially purified form. 
Typically, the antibody will be present in a composition that is substantially free of other 
polypeptides e.g. where less than 90% (by weight), usually less than 60% and more usually less than 
50% of the composition is made up of other polypeptides. 

Antibodies of the invention can be of any isotype {e.g. IgA, IgG, IgM i.e. an a, y or jut heavy chain), 
but will generally be IgG. Within the IgG isotype, antibodies may be IgGl, IgG2, IgG3 or IgG4 
subclass. Antibodies of the invention may have a k or a X light chain. 

Antibodies of the invention can take various forms, including whole antibodies, antibody fragments 
such as F(ab') 2 and F(ab) fragments, Fv fragments (non-covalent heterodimers), single-chain 
antibodies such as single chain Fv molecules (scFv), minibodies, oligobodies, etc. The term 
"antibody" does not imply any particular origin, and includes antibodies obtained through 
non-conventional processes, such as phage display. 

The invention provides a process for detecting polypeptides of the invention, comprising the steps of: 
(a) contacting an antibody of the invention with a biological sample under conditions suitable for the 
formation of an antibody-antigen complexes; and (b) detecting said complexes. 

The invention provides a process for detecting antibodies of the invention, comprising the steps of: 
(a) contacting a polypeptide of the invention with a biological sample {e.g. a blood or serum sample) 
under conditions suitable for the formation of an antibody-antigen complexes; and (b) detecting said 
complexes. 

— Forgood cross-reactivity; preferred antibodies of tiieinv^ 

are common to at least two {e.g. 2, 3, 4 or 5) homologous coding sequences, as described in more 
detail above. Conversely, for good specificity, other preferred antibodies of the invention bind to 
epitopes that include an amino acid that differs between homologous coding sequences e.g. binds to 
Phe-132 in SEQ ID NO: 17994 to distinguish from SEQ ID NOs: 88, 4374, 8834 and 13214, all of 
which have a Serine residue at position 132. 

Nucleic acids 

The invention provides nucleic acid comprising the GBS nucleotide sequences disclosed in the 

examples. These nucleic acid sequences are the odd SEQ ID NOs between 1 and 22739. 

-6- 
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The invention also provides nucleic acid comprising nucleotide sequences having sequence identity 
to the GBS nucleotide sequences disclosed in the examples. Identity between sequences is preferably 
determined by the Smith- Waterman homology search algorithm as described above. 

The invention also provides nucleic acid which can hybridize to the GBS nucleic acid disclosed in 
5 the examples. Hybridization reactions can be performed under conditions of different "stringency". 
Conditions that increase stringency of a hybridization reaction of widely known and published in the 
art [e.g. page 7.52 of reference 25]. Examples of relevant conditions include (in order of increasing 
stringency): incubation temperatures of 25°C, 37°C, 50°C, 55°C and 68°C; buffer concentrations of 
10 x SSC, 6 x SSC, 1 x SSC, 0.1 x SSC (where SSC is 0.15 M NaCl and 15 mM citrate buffer) and 
10 their equivalents using other buffer systems; formamide concentrations of 0%, 25%, 50%, and 75%; 
incubation times from 5 minutes to 24 hours; 1, 2, or more washing steps; wash incubation times of 
1, 2, or 15 minutes; and wash solutions of 6xSSC, 1 x SSC, 0.1 x SSC, or de-ionized water. 
Hybridization techniques and their optimization are well known in the art [e.g. see refs 25-28, etc.]. 

hi some embodiments, nucleic acid of the invention hybridizes to a target of the invention under low 
15 stringency conditions; in other embodiments it hybridizes under intermediate stringency conditions; 
in preferred embodiments, it hybridizes under high stringency conditions. An exemplary set of low 
stringency hybridization conditions is 50°C and 10 x SSC. An exemplary set of intermediate 
stringency hybridization conditions is 55°C and 1 x SSC. An exemplary set of high stringency 
hybridization conditions is 68°C and 0.1 x SSC. 

20 Nucleic acid comprising fragments of these sequences are also provided. These should comprise at 
least n consecutive nucleotides from the GBS sequences and, depending on the particular sequence, n 
is 10 or more (e.g. 12, 14, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200 or more). 

The invention provides nucleic acid of formula 5 f -X-Y-Z-3' 5 wherein: -X- is a nucleotide sequence 
consisting of x nucleotides; ~Z- is a nucleotide sequence consisting of z nucleotides; -Y- is a 
25 nucleotide sequence consisting of either (a) a fragment of one of the odd-numbered SEQ ID NOs: 1 
to 22739, or (b) the complement of (a); and said nucleic acid 5'-X-Y-Z-3 f is neither (i) a fragment of 
one of the odd-numbered SEQ ID NOs: 1 to 22739 nor (ii) the complement of (i). The -X- and/or -Z- 
moieties may comprise a promoter sequence (or its complement). 

The invention also provides nucleic acid encoding the polypeptides and polypeptide fragments of the 
30 invention. 

The invention includes nucleic acid comprising sequences complementary to the sequences disclosed 
in the sequence listing (e.g. for antisense or probing, or for use as primers), as well as the sequences 
in the orientation actually shown. 

Nucleic acids of the invention can be used in hybridisation reactions (e.g. Northern or Southern blots, 
35 or in nucleic acid microarrays or 'gene chips') and amplification reactions (e.g. PCR, SDA, SSSR, 
LCR, TMA, NASBA, etc.) and other nucleic acid techniques. 

-7- 
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JNucieic acid according to the invention can take various forms (e.g. single-stranded, double-stranded, 
vectors, primers, probes, labelled etc.). Nucleic acids of the invention may be circular or branched, 
but will generally be linear. Unless otherwise specified or required, any embodiment of the invention 
that utilizes a nucleic acid may utilize both the double-stranded form and each of two complementary 
5 single-stranded forms which make up the double-stranded form. Primers and probes are generally 
single-stranded, as are antisense nucleic acids. 

Nucleic acids of the invention are preferably provided in purified or substantially purified form i.e. 
substantially free from other nucleic acids (e.g. free from naturally-occurring nucleic acids), 
particularly from other streptococcal or host cell nucleic acids, generally being at least about 50% 
10 pure (by weight), and usually at least about 90% pure. Nucleic acids of the invention are preferably 
GBS nucleic acids. 

Nucleic acids of the invention may be prepared in many ways e.g. by chemical synthesis (e.g. 
phosphoramidite synthesis of DNA) in whole or in part, by digesting longer nucleic acids using 
nucleases (e.g. restriction enzymes), by joining shorter nucleic acids or nucleotides (e.g. using ligases 
15 or polymerases), from genomic or cDNA libraries, etc. 

Nucleic acid of the invention may be attached to a solid support (e.g. a bead, plate, filter, film, slide, 
microarray support, resin, etc.). Nucleic acid of the invention may be labelled e.g. with a radioactive 
or fluorescent label, or a biotin label. This is particularly useful where the nucleic acid is to be used 
in detection techniques e.g. where the nucleic acid is a primer or as a probe. 

20 The term "nucleic acid" includes in general means a polymeric form of nucleotides of any length, 
which contain deoxyribonucleotides, ribonucleotides, and/or their analogs. It includes DNA, RNA, 
DNA/RNA hybrids. It also includes DNA or RNA analogs, such as those containing modified 
backbones (e.g. peptide nucleic acids (PNAs) or phosphorothioates) or modified bases. Thus the 
invention includes mRNA, tRNA, rRNA, ribozymes, DNA, cDNA, recombinant nucleic acids, 

25 branched nucleic acids, plasmids, vectors, probes, primers, etc.. Where nucleic acid of the invention 
takes the form of RNA, it may or may not have a 5 1 cap. 

Nucleic acids of the invention comprise GBS sequences, but they may also comprise non-GBS 
sequences (e.g. in nucleic acids of formula 5 -X-Y-Z-3', as defined above). This is particularly useful 
for primers, which may thus comprise a first sequence complementary to a GBS nucleic acid target 
30 and a second sequence which is not complementary to the nucleic acid target. Any such 
non-complementary sequences in the primer are preferably 5 ! to the complementary sequences. 
Typical non-complementary sequences comprise restriction sites or promoter sequences. 

Nucleic acids of the invention can be prepared in many ways e.g. by chemical synthesis (at least in 
part), by digesting longer nucleic acids using nucleases (e.g. restriction enzymes), by joining shorter 
35 nucleic acids (e.g. using ligases or polymerases), from genomic or cDNA libraries, etc. 
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Nucleic acids of the invention may be part of a vector i.e. part of a nucleic acid construct designed 
for transduction/transfection of one or more cell types. Vectors may be, for example, "cloning 
vectors" which are designed for isolation, propagation and replication of inserted nucleotides, 
"expression vectors" which are designed for expression of a nucleotide sequence in a host cell, "viral 
5 vectors" which is designed to result in the production of a recombinant virus or virus-like particle, or 
"shuttle vectors", which comprise the attributes of more than one type of vector. Preferred vectors 
are plasmids. A "host cell" includes an individual cell or cell culture which can be or has been a 
recipient of exogenous nucleic acid. Host cells include progeny of a single host cell, and the progeny 
may not necessarily be completely identical (in morphology or in total DNA complement) to the 
10 original parent cell due to natural, accidental, or deliberate mutation and/or change. Host cells 
include cells transfected or infected in vivo or in vitro with nucleic acid of the invention. 

Where a nucleic acid is DNA, it will be appreciated that "U" in a RNA sequence will be replaced by 
"T" in the DNA. Similarly, where a nucleic acid is RNA, it will be appreciated that "T" in a DNA 
sequence will be replaced by "U" in the RNA. 

1 5 The term "complement" or "complementary" when used in relation to nucleic acids refers to Watson- 
Crick base pairing. Thus the complement of C is G, the complement of G is C, the complement of A 
is T (or U), and the complement of T (or U) is A. It is also possible to use bases such as I (the purine 
inosine) e.g. to complement pyrimidines (C or T). The terms also imply a direction - the complement 
of 5 f -ACAGT-3' is 5'-ACTGT~3 ! rather than 5'-TGTCA-3'. 

20 Nucleic acids of the invention can be used, for example: to produce polypeptides; as hybridization 
probes for the detection of nucleic acid in biological samples; to generate additional copies of the 
nucleic acids; to generate ribozymes or antisense oligonucleotides; as single-stranded DNA primers 
or probes; or as triple-strand forming oligonucleotides. 

The invention provides a process for producing nucleic acid of the invention, wherein the nucleic 
25 acid is synthesised in part or in whole using chemical means. 

The invention provides vectors comprising nucleotide sequences of the invention (e.g. cloning or 
expression vectors) and host cells transformed with such vectors. 

..... TheJnvention also. provides.a.kit-comprising primers (e.g. ECR-primers)^for ampUfying^a -template 

sequence contained within a streptococcus bacterium (e.g. GBS) nucleic acid sequence, the kit 
30 comprising a first primer and a second primer, wherein the first primer is substantially 
complementary to said template sequence and the second primer is substantially complementary to a 
complement of said template sequence, wherein the parts of said primers which have substantial 
complementarity define the termini of the template sequence to be amplified. The first primer and/or 
the second primer may include a detectable label {e.g. a fluorescent label). 

35 The invention also provides a kit comprising first and second single-stranded oligonucleotides which 
allow amplification of a streptococcal template nucleic acid sequence contained in a single- or 
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doilbie^^ mixfore thereof), wherein: (a) the first oligonucleotide comprises a 

primer sequence which is substantially complementary to said template nucleic acid sequence; 
(b) the second oligonucleotide comprises a primer sequence which is substantially complementary to 
the complement of said template nucleic acid sequence; (c) the first oligonucleotide and/or the 
second oligonucleotide comprise(s) sequence which is not complementary to said template nucleic 
acid; and (d) said primer sequences define the termini of the template sequence to be amplified. The 
non-complementary sequence(s) of feature (c) are preferably upstream of (i.e. 5' to) the primer 
sequences. One or both of these (c) sequences may comprise a restriction site [e.g. ref. 29] or a 
promoter sequence [e.g. 30]. The first oligonucleotide and/or the second oligonucleotide may include 
a detectable label (e.g. a fluorescent label). 

The invention provides a process for detecting nucleic acid of the invention, comprising the steps of: 
(a) contacting a nucleic probe according to the invention with a biological sample under hybridising 
conditions to form duplexes; and (b) detecting said duplexes. 

The invention provides a process for detecting GBS in a biological sample {e.g. blood), comprising 
the step of contacting nucleic acid according to the invention with the biological sample under 
hybridising conditions. The process may involve nucleic acid amplification {e.g. PCR, SDA, SSSR, 
LCR, TMA, NASBA, etc.) or hybridisation {e.g. microarrays, blots, hybridisation with a probe in 
solution etc.). PCR detection of GBS in clinical samples has been reported [e.g. see refs. 31 to 34]. 
Clinical assays based on nucleic acid are described in general in ref. 35. 

The invention provides a process for preparing a fragment of a target sequence, wherein the fragment 
is prepared by extension of a nucleic acid primer. The target sequence and/or the primer are nucleic 
acids of the invention. The primer extension reaction may involve nucleic acid amplification {e.g. 
PCR, SDA, SSSR, LCR, TMA, NASBA, etc.). 

Nucleic acid amplification according to the invention may be quantitative and/or real-time. 

For certain embodiments of the invention, nucleic acids are preferably at least 7 nucleotides in length 
{e.g. 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 
34, 35, 36, 37, 38, 39, 40, 45, 50, 55, 60, 65, 70, 75, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 
180, 190, 200, 225, 250, 275, 300 nucleotides or longer). 

For certain embodiments of the invention, nucleic acids are preferably at most 500 nucleotides in 
length {e.g. 450, 400, 350, 300, 250, 200, 150, 140, 130, 120, 110, 100, 90, 80, 75, 70, 65, 60, 55, 50, 
45, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15 
nucleotides or shorter). 

Primers and probes of the invention, and other nucleic acids used for hybridization, are preferably 
between 10 and 30 nucleotides in length (e.g. 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,20,21,22, 23,24, 
25, 26, 27, 28, 29, or 30 nucleotides). 
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Pnarmaceuncai compositions 

The invention provides compositions comprising: (a) polypeptide, antibody, and/or nucleic acid of 
the invention; and (b) a pharmaceutically acceptable carrier. These compositions may be suitable as 
immunogenic compositions, for instance, or as diagnostic reagents, or as vaccines. Vaccines 
5 according to the invention may either be prophylactic (i.e. to prevent infection) or therapeutic (i.e. to 
treat infection), but will typically be prophylactic. 

A 'pharmaceutically acceptable carrier' includes any carrier that does not itself induce the production 
of antibodies harmful to the individual receiving the composition. Suitable carriers are typically 
large, slowly metabolised macromolecules such as proteins, polysaccharides, polylactic acids, 

10 polyglycolic acids, polymeric amino acids, amino acid copolymers, sucrose, trehalose, lactose, and 
lipid aggregates (such as oil droplets or liposomes). Such carriers are well known to those of ordinary 
skill in the art. The vaccines may also contain diluents, such as water, saline, glycerol, etc. 
Additionally, auxiliary substances, such as wetting or emulsifying agents, pH buffering substances, 
and the like, may be present. Sterile pyrogen-firee, phosphate-buffered physiologic saline is a typical 

15 carrier. A thorough discussion of pharmaceutically acceptable excipients is available in ref. 155. 

Compositions of the invention may include an antimicrobial, particularly if packaged in a multiple 
dose format. 

Compositions of the invention may comprise detergent e.g. a Tween (polysorbate), such as Tween 
80. Detergents are generally present at low levels e.g. <0.01%. 

20 Compositions of the invention may include sodium salts (e.g. sodium chloride) to give tonicity. A 
concentration of 10±2mg/ml NaCl is typical. 

Compositions of the invention will generally include a buffer. A phosphate buffer is typical. 

Compositions of the invention may comprise a sugar alcohol (e.g. mannitol) or a disaccharide (e.g. 
sucrose or trehalose) e.g. at around 15-30mg/ml (e.g. 25 mg/ml), particularly if they are to be 
25 lyophilised or if they include material which has been reconstituted from lyophilised material. The 
pH of a composition for lyophilisation may be adjusted to around 6.1 prior to lyophilisation. 

Polypeptides of the invention may be administered in conjunction with other immunoregulatory 
agents. In particular, compositions "will "usually include a vaccine adjuvant Adjuvants ; whichlEiay be 
used in compositions of the invention include, but are not limited to: 

30 A. Mineral-containing compositions 

Mineral containing compositions suitable for use as adjuvants in the invention include mineral salts, 
such as aluminium salts and calcium salts. The invention includes mineral salts such as hydroxides 
(e.g. oxyhydroxides), phosphates (e.g. hydroxyphosphates, orthophosphates), sulphates, etc. [e.g. see 
chapters 8 & 9 of ref. 36], or mixtures of different mineral compounds (e.g. a mixture of a phosphate 

35 and a hydroxide adjuvant, optionally with an excess of the phosphate), with the compounds taking 
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any r silitab!e flftflf t^g? 'feel, 11 amorphous, eta), and with adsorption to the salt(s) being 
preferred. Mineral containing compositions may also be formulated as a particle of metal salt [37]. 

Aluminum salts may be included in vaccines of the invention such that the dose of Al 3+ is between 
0.2 and 1 .0 mg per dose. 

5 A typical aluminium phosphate adjuvant is amorphous aluminium hydroxyphosphate with PO4/AI 
molar ratio between 0.84 and 0.92, included at 0.6mg Al 3+ /ml. Adsorption with a low dose of 
aluminium phosphate may be used e.g. between 50 and lOOfXg Al 3+ per conjugate per dose. Where an 
aluminium phosphate it used and it is desired not to adsorb an antigen to the adjuvant, this is 
favoured by including free phosphate ions in solution (e.g. by the use of a phosphate buffer). 

10 B. Oil Emulsions 

Oil emulsion compositions suitable for use as adjuvants in the invention include squalene-water 
emulsions, such as MF59 (5% Squalene, 0.5% Tween 80, and 0.5% Span 85, formulated into 
submicron particles using a microfluidizer) [Chapter 10 of ref, 36; see also refs. 38-40]. MF59 is 
used as the adjuvant in the FLUAD™ influenza virus trivalent subunit vaccine. 

15 Particularly preferred adjuvants for use in the compositions are submicron oil-in-water emulsions. 
Preferred submicron oil-in-water emulsions for use herein are squalene/water emulsions optionally 
containing varying amounts of MTP-PE, such as a submicron oil-in-water emulsion containing 4-5% 
w/v squalene, 0.25-1.0% w/v Tween 80 (polyoxyethylenesorbitan monooleate), and/or 0.25-1.0% 
Span 85 (sorbitan trioleate), and, optionally, N-acetylmuramyl-L-alanyl-D-isogluatminyl-L-alanine- 

20 2-(r-2'-dipalnGdtoyl-sn-glycero-3-hydroxyphosphophoryloxy)-ethylamine (MTP-PE). Submicron 
oil-in-water emulsions, methods of making the same and immunostimulating agents, such as 
muramyl peptides, for use in the compositions, are described in detail in references 38 & 41-42. 

Complete Freund's adjuvant (CFA) and incomplete Freund's adjuvant (IF A) may also be used as 
adjuvants in the invention. 

25 C. Saponin formulations [chapter 22 of ref 36] 

Saponin formulations may also be used as adjuvants in the invention. Saponins are a heterologous 
group of sterol glycosides and triterpenoid glycosides that are found in the bark, leaves, stems, roots 
and even -flowers of a wide range of plant-species. Saponins isolated from the bark of the Quillaja 
saponaria Molina tree have been widely studied as adjuvants. Saponin can also be commercially 

30 obtained from Smilax ornata (sarsaprilla), Gypsophilla paniculata (brides veil), and Saponaria 
officianalis (soap root). Saponin adjuvant formulations include purified formulations, such as QS21, 
as well as lipid formulations, such as ISCOMs. 

Saponin compositions have been purified using HPLC and RP-HPLC. Specific purified fractions 
using these techniques have been identified, including QS7, QS17, QS18, QS21, QH-A, QH-B and 
35 QH-C. Preferably, the saponin is QS21. A method of production of QS21 is disclosed in ref. 43. 
Saponin formulations may also comprise a sterol, such as cholesterol [44]. 
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UoMtfinatioh^W '^ap'bd^ri'S " l ^a "ehblesterols can be used to form unique particles called 
inununostimulating complexes (ISCOMs) [chapter 23 of ref. 36]. ISCOMs typically also include a 
phospholipid such as phosphatidylethanolamine or phosphatidylcholine. Any known saponin can be 
used in ISCOMs. Preferably, the ISCOM includes one or more of QuilA, QHA and QHC. ISCOMs 
5 are further described in refs. 44-46. Optionally, the ISCOMs may be devoid of additional 
detergent(s) [47]. 

A review of the development of saponin based adjuvants can be found in refs. 48 & 49. 

D. Virosomes and virus-like particles 

Virosomes and virus-like particles (VLPs) can also be used as adjuvants in the invention. These 
10 structures generally contain one or more proteins from a virus optionally combined or formulated 
with a phospholipid. They are generally non-pathogenic, non-replicating and generally do not contain 
any of the native viral genome. The viral proteins may be recombinantly produced or isolated from 
whole viruses. These viral proteins suitable for use in virosomes or VLPs include proteins derived 
from influenza virus (such as HA or NA), Hepatitis B virus (such as core or capsid proteins), 
15 Hepatitis E virus, measles virus, Sindbis virus, Rotavirus, Foot-and-Mouth Disease virus, Retrovirus, 
Norwalk virus, human Papilloma virus, HIV, RNA-phages, QB-phage (such as coat proteins), GA- 
phage, fr-phage, AP205 phage, and Ty (such as retrotransposon Ty protein pi). VLPs are discussed 
further in refs. 50-55. Virosomes are discussed further in, for example, ref. 56 

E. Bacterial or microbial derivatives 

20 Adjuvants suitable for use in the invention include bacterial or microbial derivatives such as 
non-toxic derivatives of enterobacterial lipopolysaccharide (LPS), Lipid A derivatives, 
immunostimulatory oligonucleotides and ADP-ribosylating toxins and detoxified derivatives thereof. 

Non-toxic derivatives of LPS include monophosphoryl lipid A (MPL) and 3-O-deacylated MPL 
(3dMPL). 3dMPL is a mixture of 3 de-O-acylated monophosphoryl lipid A with 4, 5 or 6 acylated 
25 chains. A preferred "small particle" form of 3 De-O-acylated monophosphoryl lipid A is disclosed in 
ref. 57. Such "small particles" of 3dMPL are small enough to be sterile filtered through a 0.22jLim 
membrane [57]. Other non-toxic LPS derivatives include monophosphoryl lipid A mimics, such as 
aminoalkyl glucosaminide phosphate derivatives e.g. RC-529 [58,59]. 

Lipid A derivatives include derivatives of lipid A from Escherichia coli such as OM-174. OM-174 is 
30 described for example in refs. 60 & 61 . 

Immunostimulatory oligonucleotides suitable for use as adjuvants in the invention include nucleotide 
sequences containing a CpG motif (a dinucleotide sequence containing an unmethylated cytosine 
linked by a phosphate bond to a guanosine). Double-stranded RNAs and oligonucleotides containing 
palindromic or poly(dG) sequences have also been shown to be immunostimulatory. 

35 The CpG's can include nucleotide modifications/analogs such as phosphorothioate modifications and 
can be double-stranded or single-stranded. References 62, 63 and 64 disclose possible analog 
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sub'stfhltibfis JfefgT replacM with 2'-deoxy-7-deazaguanosine. The adjuvant effect of 

CpG oligonucleotides is further discussed in refs. 65-70. 

The CpG sequence may be directed to TLR9, such as the motif GTCGTT or TTCGTT [71]. The 
CpG sequence may be specific for inducing a Thl immune response, such as a CpG-A ODN, or it 
5 may be more specific for inducing a B cell response, such a CpG-B ODN. CpG- A and CpG-B ODNs 
are discussed in refs. 72-74. Preferably, the CpG is a CpG- A ODN. 

Preferably, the CpG oligonucleotide is constructed so that the 5' end is accessible for receptor 
recognition. Optionally, two CpG oligonucleotide sequences may be attached at their 3 ! ends to form 
"immunomers". See, for example, refs. 71 & 75-77. 

10 Bacterial ADP-ribosylating toxins and detoxified derivatives thereof may be used as adjuvants in the 
invention. Preferably, the protein is derived from E.coli (E.coli heat labile enterotoxin "LT"), cholera 
("CT"), or pertussis ("PT")- The use of detoxified ADP-ribosylating toxins as mucosal adjuvants is 
described in ref. 78 and as parenteral adjuvants in ref. 79. The toxin or toxoid is preferably in the 
form of a holotoxin, comprising both A and B subunits. Preferably, the A subunit contains a 

15 detoxifying mutation; preferably the B subunit is not mutated. Preferably, the adjuvant is a detoxified 
LT mutant such as LT-K63, LT-R72, and LT-G192. The use of ADP-ribosylating toxins and 
detoxified derivatives thereof, particularly LT-K63 and LT-R72, as adjuvants can be found in refs. 
80-87. Numerical reference for amino acid substitutions is preferably based on the alignments of the 
A and B subunits of ADP-ribosylating toxins set forth in ref. 88, specifically incorporated herein by 

20 reference in its entirety. 

F. Human immunotnodulators 

Human immunomodulators suitable for use as adjuvants in the invention include cytokines, such as 
interleukins (e.g. IL-1, IL-2, IL-4, IL-5, IL-6, IL-7, IL-12 [89], etc.) [90], interferons (e.g. interferon- 
y), macrophage colony stimulating factor, and tumor necrosis factor. 

25 G. Bioadhesives and Mucoadhesives 

Bioadhesives and mucoadhesives may also be used as adjuvants in the invention. Suitable 
bioadhesives include esterified hyaluronic acid microspheres [91] or mucoadhesives such as 
i^ross-linked jderiyatives. of pQly(acrylic acid), polyvinyl alcohol, polyvinyl pyrollidone, 
polysaccharides and carboxymethylcellulose. Chitosan and derivatives thereof may also be used as 

3 0 adjuvants in the invention [92] . 

H. Microparticles 

Microparticles may also be used as adjuvants in the invention. Microparticles (i.e. a particle of 
~100nm to -lSOjuim in diameter, more preferably ~200nm to ~30jom in diameter, and most preferably 
~500nm to ~10|um in diameter) formed from materials that are biodegradable and non-toxic (e.g. a 
35 poly(a-hydroxy acid), a polyhydroxybutyric acid, a polyorthoester, a polyanhydride, a 
polycaprolactone, etc.), with poly(lactide-co-glycolide) are preferred, optionally treated to have a 
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negati^^ SDS) or a positively-charged surface {e.g. with a cationic 

detergent, such as CTAB). 

/. Liposomes (Chapters 13 & 14 of ref. 36) 

Examples of liposome formulations suitable for use as adjuvants are described in refs, 93-95. 

5 J. Polyoxy ethylene ether and polyoxyethylene ester formulations 

Adjuvants suitable for use in the invention include polyoxyethylene ethers and polyoxyethylene 
esters [96]. Such formulations further include polyoxyethylene sorbitan ester surfactants in 
combination with an octoxynol [97] as well as polyoxyethylene alkyl ethers or ester surfactants in 
combination with at least one additional non-ionic surfactant such as an octoxynol [98]. Preferred 
10 polyoxyethylene ethers are selected from the following group: polyoxyethylene-9-lauryl ether 
(laureth 9), polyoxyethylene-9-steoryl ether, polyoxytheylene-8-steoryl ether, polyoxyethylene-4- 
lauryl ether, polyoxyethylene-35-lauryl ether, and polyoxyethylene-23-lauryl ether. 

JSC Polyphosphazene (PCPP) 

PCPP formulations are described, for example, in refs. 99 and 100. 
15 L. Muramyl peptides 

Examples of muramyl peptides suitable for use as adjuvants in the invention include N-acetyl- 
muramyl-L-threonyl-D-isoglutamine (thr-MDP), N-acetyl-normuramyl-L-alanyl-D-isoglutamine (nor- 
MDP), and N-acetylmuramyl-L-alanyl-D-isoglutaminyl-L-alanine-2-( 1 f -2-dipalmitoyl-.s7i-glycero-3 - 
hydroxyphosphoryloxy)-ethylamine MTP-PE). 

20 M. Imidazoquinolone Compounds. 

Examples of imidazoquinolone compounds suitable for use adjuvants in the invention include 
Imiquamod and its homologues (e,g. "Resiquimod 3M"), described further in refs. 101 and 102. 

N. Tliiosemicarbazone Compounds. 

Examples of thiosemicarbazone compounds, as well as methods of formulating, manufacturing, and 
25 screening for compounds all suitable for use as adjuvants in the invention include those described in 
ref. 103. The thiosemicarbazones are particularly effective in the stimulation of human peripheral 
blood mononuclear cells for the production of cytokines, such as TNF-a. 

O. Tryptanthrin Compounds. 

Examples of tryptanthrin compounds, as well as methods of formulating, manufacturing, and 
30 screening for compounds all suitable for use as adjuvants in the invention include those described in 
ref. 104. The tryptanthrin . compounds are particularly effective in the stimulation of human 
peripheral blood mononuclear cells for the production of cytokines, such as TNF-a. 

The invention may also comprise combinations of aspects of one or more of the adjuvants identified 
above. For example, the following combinations may be used as adjuvant compositions in the 
35 invention: (1) a saponin and an oil-in-water emulsion [105]; (2) a saponin (e.g. QS21) + a non-toxic 
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LPS derivative r<M?LfXMfr^) a saponin (e.g. QS21) + a non-toxic LPS derivative (e.g. 
3dMPL) + a cholesterol; (4) a saponin (e.g. QS21) + 3dMPL + IL-12 (optionally + a sterol) [107]; 
(5) combinations of 3dMPL with, for example, QS21 and/or oil-in- water emulsions [108]; (6) SAF, 
containing 10% squalane, 0.4% Tween 80™, 5% pluronic-block polymer L121, and thr-MDP, either 
5 microfluidized into a submicron emulsion or vortexed to generate a larger particle size emulsion. (7) 
Ribi™ adjuvant system (RAS), (Ribi Immunochem) containing 2% squalene, 0.2% Tween 80, and 
one or more bacterial cell wall components from the group consisting of monophosphorylipid A 
(MPL), trehalose dimycolate (TDM), and cell wall skeleton (CWS), preferably MPL + CWS 
(Detox™); (8) one or more mineral salts (such as an aluminum salt) + a non-toxic derivative of LPS 
10 (such as 3dMPL); and (9) one or more mineral salts (such as an aluminum salt) + an 
immunostimulatory oligonucleotide (such as a nucleotide sequence including a CpG motif). 

Other substances that act as immunostimulating agents are disclosed in chapter 7 of ref. 36. 

The use of an aluminium hydroxide or aluminium phosphate adjuvant is particularly preferred, and 
antigens are generally adsorbed to these salts. Calcium phosphate is another preferred adjuvant. 

15 The pH of compositions of the invention is preferably between 6 and 8, preferably about 7. Stable pH 
may be maintained by the use of a buffer. Where a composition comprises an aluminium hydroxide 
salt, it is preferred to use a histidine buffer [109]. The composition may be sterile and/or 
pyrogen-free. Compositions of the invention may be isotonic with respect to humans. 

Compositions may be presented in vials, or they may be presented in ready-filled syringes. The 
20 syringes may be supplied with or without needles. A syringe will include a single dose of the 
composition, whereas a vial may include a single dose or multiple doses. Injectable compositions 
will usually be liquid solutions or suspensions. Alternatively, they may be presented in solid form 
(e.g. freeze-dried) for solution or suspension in liquid vehicles prior to injection. 

Compositions of the invention may be packaged in unit dose form or in multiple dose form. For 
25 multiple dose forms, vials are preferred to pre-filled syringes. Effective dosage volumes can be 
routinely established, but a typical human dose of the composition for injection has a volume of 
0.5ml. 

-Where a composition of the invention is to be prepared extemporaneously prior to use (e.g. where a 
component is presented in lyophilised form) and is presented as a kit, the kit may comprise two vials, 
30 or it may comprise one ready-filled syringe and one vial, with the contents of the syringe being used 
to reactivate the contents of the vial prior to injection. 

Immunogenic compositions used as vaccines comprise an immunologically effective amount of 
antigen(s), as well as any other components, as needed. By 'immunologically effective amount 5 , it is 
meant that the administration of that amount to an individual, either in a single dose or as part of a 
35 series, is effective for treatment or prevention. This amount varies depending upon the health and 
physical condition of the individual to be treated, age, the taxonomic group of individual to be treated 
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(e.g. rion-h^ primate, etc.), the capacity of the individual's immune system to synthesise 

antibodies, the degree of protection desired, the formulation of the vaccine, the treating doctor's 
assessment of the medical situation, and other relevant factors. It is expected that the amount will fall 
in a relatively broad range that can be determined through routine trials, and a typical quantity of 
5 each meningococcal saccharide antigen per dose is between ljug and lOmg per antigen. 

Pharmaceutical uses 

The invention also provides a method of treating a patient, comprising administering to the patient a 
therapeutically effective amount of a composition of the invention. The patient may either be at risk 
from the disease themselves or may be a pregnant woman ('maternal immunisation' [110]). 

10 The invention provides nucleic acid, polypeptide, or antibody of the invention for use as 
medicaments {e.g. as immunogenic compositions or as vaccines) or as diagnostic reagents. It also 
provides the use of nucleic acid, polypeptide, or antibody of the invention in the manufacture of: (i) a 
medicament for treating or preventing disease and/or infection caused by GBS; (ii) a diagnostic 
reagent for detecting the presence of GBS or of antibodies raised against GBS; and/or (iii) a reagent 

15 which can raise antibodies against GBS. Said GBS can be of any serotype or strain. Said disease may 
be, for instance, bacteremia, meningitis, puerperal fever, scarlet fever, erysipelas, pharyngitis, 
impetigo, necrotising fasciitis, myositis or toxic shock syndrome. 

The patient is preferably a human. Where the vaccine is for prophylactic use, the human is preferably 
an adolescent (e.g. aged between 10 and 20 years); where the vaccine is for therapeutic use, the 
20 human is preferably an adult. A vaccine intended for children or adolescents may also be 
administered to adults e.g. to assess safety, dosage, immunogenicity, etc. 

One way of checking efficacy of therapeutic treatment involves monitoring GBS infection after 
administration of the composition of the invention. One way of checking efficacy of prophylactic 
treatment involves monitoring immune responses against an administered polypeptide after 

25 administration. Immunogenicity of compositions of the invention can be determined by 
administering them to test subjects (e.g. children 12-16 months age, or animal models e.g. a mouse 
model) and then determining standard parameters including ELISA titres (GMT) of IgG. These 
immune responses will generally be determined around 4 weeks after administration of the 
" composition, and compared to values determined before administration of the composition. Where 

30 more than one dose of the composition is administered, more than one post-administration 
determination may be made. A mouse neonatal sepsis model for protective efficacy against GBS 
infection is known e.g. see ref. 111. 

Administration of polypeptide antigens is a preferred method of treatment for inducing immunity. 
Administration of antibodies of the invention is another preferred method of treatment. This method 
35 of passive immunisation is particularly useful for newborn children or for pregnant women. This 
method will typically use monoclonal antibodies, which will be humanised or fully human. 
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Preferred compositions for use in immunisation include more than one GBS polypeptide. Multiple 
antigens can be included as separate admixed polypeptides in a single composition, and/or can be 
part of a hybrid polypeptide as described above. Preferred combinations of antigens include at least 
one (e.g. 1, 2, 3, 4, 5, 6 or more) 'core' polypeptide (as described below; Table V) and at least one 
5 (e.g. 1, 2, 3, 4, 5, 6 or more) 'variable 5 polypeptide (as described below; Table VI). Mixtures of one 
core polypeptide with more than one variable polypeptides are preferred. Examples of these 
combinations, using the nomenclature of reference 2 5 include (a) GBS322 (a core antigen) plus 
GBS80, GBS104 & GBS67 (all variable antigens); and (b) GBS322 plus GBS80 & GBS104. In 
some embodiments, this specific 3-valent combination [112] and this specific 4-valent combination 
10 [113] are excluded from the invention, although they illustrate the principle of combining core and 
variable antigens. 

Compositions of the invention will generally be administered directly to a patient. Direct delivery 
may be accomplished by parenteral injection (e.g. subcutaneously, intraperitoneal^, intravenously, 
intramuscularly, or to the interstitial space of a tissue), or by rectal, oral, vaginal, topical, 
15 transdermal, intranasal, sublingual, ocular, aural, pulmonary or other mucosal administration. 
Intramuscular administration to the thigh or the upper arm is preferred. Injection may be via a needle 
(e.g. a hypodermic needle), but needle-free injection may alternatively be used. A typical 
intramuscular dose is 0.5 ml. 

The invention may be used to elicit systemic and/or mucosal immunity. 

20 Dosage treatment can be a single dose schedule or a multiple dose schedule. Multiple doses may be 
used in a primary immunisation schedule and/or in a booster immunisation schedule. A primary dose 
schedule may be followed by a booster dose schedule. Suitable timing between priming doses (e.g. 
between 4-16 weeks), and between pruning and boosting, can be routinely determined. 

Bacterial infections affect various areas of the body and so compositions may be prepared in various 
25 forms. For example, the compositions may be prepared as injectables, either as liquid solutions or 
suspensions. Solid forms suitable for solution in, or suspension in, liquid vehicles prior to injection 
can also be prepared (e.g. a lyophilised composition). The composition may be prepared for topical 
administration e.g. as an ointment, cream or powder. The composition be prepared for oral 
administration e.g. as a tablet or capsule, or as a syrup (optionally flavoured). The composition may 
30 be prepared for pulmonary administration e.g. as an inhaler, using a fine powder or a spray. The 
composition may be prepared as a suppository or pessary. The composition may be prepared for 
nasal, aural or ocular administration e.g. as spray, drops, gel or powder [e.g. refs 1 14 & 1 15]. 

Further antigenic components of compositions of the invention 

The invention also provides a composition comprising a polypeptide or the invention and one or 
35 more of the following further antigens: 
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- M '^Kari'de'-Mt'igSri Mffi Krftefithgitidis serogroup A, C, W135 and/or Y (preferably all 

four), such as the oligosaccharide disclosed in ref. 116 from serogroup C [see also ref. 1 17] or 
the oligosaccharides of ref 118. 

- a saccharide antigen from Streptococcus pneumoniae [e.g. 1 19, 120, 121 ]. 
5 - an antigen from hepatitis A virus, such as inactivated virus [e.g. 122, 123]. 

- an antigen from hepatitis B virus, such as the surface and/or core antigens [e.g. 123, 124]. 

- a diphtheria antigen, such as a diphtheria toxoid [e.g. chapter 3 of ref 125] e.g. the CRM ]97 
mutant [e.g. 126]. 

- a tetanus antigen, such as a tetanus toxoid [e.g. chapter 4 of ref. 125], 

10 - an antigen from Bordetella pertussis, such as pertussis holotoxin (PT) and filamentous 
haemagglutinin (FHA) from B. pertussis, optionally also in combination with pertactin and/or 
agglutinogens 2 and 3 [e.g. refs. 127 & 128]. 

- a saccharide antigen from Haemophilus influenzae B [e.g. 117]. 

- polio antigen(s) [e.g. 129, 130] such as IPV. 

15 - measles, mumps and/or rubella antigens [e.g. chapters 9, 10 & 1 1 of ref. 125]. 

- influenza antigen(s) [e.g. chapter 19 of ref 125], such as the haemagglutinin and/or 
neuraminidase surface proteins. 

- an antigen from Moraxella catarrhalis [e.g. 131]. 

- a saccharide antigen from Streptococcus agalactiae (group B streptococcus). 

20 - an antigen from Streptococcus pyogenes (group A streptococcus) [e.g. 132, 133, 134]. 

- an antigen from Staphylococcus aureus [e.g. 135]. 

The composition may comprise one or more of these further antigens. 

In another embodiment, the GBS antigens of the invention are combined with one or more additional, 
non-GBS antigens suitable for use in a vaccine designed to protect elderly or immunocompromised 
25 individuals. For example, the GBS antigens may be combined with an antigen derived from the group 
consisting of Enterococcus faecalis, Staphylococcus aureus, Staphylococcus epidermis, Pseudomonas 
aeruginosa, Legionella pneumophila, Listeria monocytogenes, Neisseria meningitides, influenza, and 
Parainfluenza virus (TIV'). 

Toxic protein antigens may be detoxified where necessary {e.g. detoxification of pertussis toxin by 
30 chemical and/or genetic means [128]). 

Where a diphtheria antigen is included in the composition it is preferred also to include tetanus 
antigen and pertussis antigens. Similarly, where a tetanus antigen is included it is preferred also to 
include diphtheria and pertussis antigens. Similarly, where a pertussis antigen is included it is 
preferred also to include diphtheria and tetanus antigens. DTP combinations are thus preferred. 

35 Saccharide antigens are preferably in the form of conjugates. Carrier proteins for the conjugates 
include bacterial toxins (such as diphtheria toxoid or tetanus toxoid), the N. meningitidis outer 
membrane protein [136], synthetic peptides [137,138], heat shock proteins [139,140], pertussis 
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proteins [141,142], protein D from H.influenzae [143,144], cytokines [145], lymphokines [145], H. 
influenzae proteins, hormones [145], growth factors [145], toxin A or B from C.difficile [146], iron- 
uptake proteins [147], artificial proteins comprising multiple human CD4+ T cell epitopes from 
various pathogen-derived antigens [148] such as the N19 protein [149], pneumococcal surface 
5 protein PspA [150], pneumolysin [151], etc. A preferred carrier protein is the CRM197 protein [152]. 

Antigens in the composition will typically be present at a concentration of at least ljag/ml each. In 
general, the concentration of any given antigen will be sufficient to elicit an immune response against 
that antigen. 

As an alternative to using proteins antigens in the immunogenic compositions of the invention, 
1 0 nucleic acid (preferably DNA e.g. in the form of a plasmid) encoding the antigen may be used. 

Antigens are preferably adsorbed to an aluminium salt. 
Screening methods 

The invention provides a process for determining whether a test compound binds to a polypeptide of 
the invention. If a test compound binds to a polypeptide of the invention and this binding inhibits the 

15 life cycle of the GBS bacterium, then the test compound can be used as an antibiotic or as a lead 
compound for the design of antibiotics. The process will typically comprise the steps of contacting a 
test compound with a polypeptide of the invention, and determining whether the test compound binds 
to said polypeptide. Preferred polypeptides of the invention for use in these processes are enzymes 
(e.g. tRNA synthetases), membrane transporters and ribosomal polypeptides. Suitable test 

20 compounds include polypeptides, polypeptides, carbohydrates, lipids, nucleic acids (e.g. DNA, RNA, 
and modified forms thereof), as well as small organic compounds (e.g. MW between 200 and 2000 
Da). The test compounds may be provided individually, but will typically be part of a library (e.g. a 
combinatorial library). Methods for detecting a binding interaction include NMR, filter-binding 
assays, gel-retardation assays, displacement assays, surface plasmon resonance, reverse two-hybrid 

25 etc. A compound which binds to a polypeptide of the invention can be tested for antibiotic activity by 
contacting the compound with GBS bacteria and then monitoring for inhibition of growth. The 
invention also provides a compound identified using these methods. 

Preferably, the process comprises the steps of: (a) contacting a polypeptide of the invention with one 
or more candidate compounds to give a mixture; (b) incubating the mixture to allow polypeptide and 
30 the candidate compound(s) to interact; and (c) assessing whether the candidate compound binds to 
the polypeptide or modulates its activity. 

Once a candidate compound has been identified in vitro as a compound that binds to a polypeptide of 
the invention then it may be desirable to perform further experiments to confirm the in vivo function 
of the compound in inhibiting bacterial growth and/or survival. Thus the method comprise the further 
35 step of contacting the compound with a GBS bacterium and assessing its effect. 
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me poiypepiiae'Txsea"ra me screfcnffig-process may be free in solution, affixed to a solid support, 
located on a cell surface or located intracellularly. Preferably, the binding of a candidate compound 
to the polypeptide is detected by means of a label directly or indirectly associated with the candidate 
compound. The label may be a fluorophore, radioisotope, or other detectable label. 

Preferred polypeptides for use in these screening methods are the 'core 5 sequences identified below. 
General 

The invention provides a computer-readable medium (e.g. a floppy disk, a hard disk, a CD-ROM, a 
DVD etc.) and/or a computer memory and/or a computer database containing one or more of the 
sequences in the sequence listing. 

The term "comprising" encompasses "including" as well as "consisting" e.g. a composition 
"comprising" X may consist exclusively of X or may include something additional e.g. X + Y. 

The term "about" in relation to a numerical value x means, for example, x±!0%. 

The word "substantially" does not exclude "completely" e.g. a composition which is "substantially 
free" from Y may be completely free from Y. Where necessary, the word "substantially" may be 
omitted from the definition of the invention. 

The N-terminus residues in the amino acid sequences in the sequence listing are given as the amino 
acid encoded by the first codon in the corresponding nucleotide sequence. Where the first codon is 
not ATG, it will be understood that it will be translated as methionine when the codon is a start 
codon, but will be translated as the indicated non-Met amino acid when the sequence is at the 
C-terminus of a fusion partner. The invention specifically discloses and encompasses each of the 
amino acid sequences of the sequence listing having a N-terminus methionine residue (e.g. a 
formyl-methionine residue) in place of any indicated non-Met residue. It also specifically discloses 
and encompasses each of the amino acid sequences of the sequence listing starting at any internal 
methionine residues in the sequences. 

As indicated in the above text, nucleic acids and polypeptides of the invention may include 
sequences that: 

(a) are identical (i.e. 100% identical) to the sequences disclosed in the sequence listing; 

(b) share sequence identity with the sequences disclosed in the sequence listing; 

(c) have 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 single nucleotide or amino acid alterations (deletions, 
insertions, substitutions), which may be at separate locations or may be contiguous, as 
compared to the sequences of (a) or (b); and 

(d) when aligned with a particular sequence from the sequence listing using a pairwise alignment 
algorithm, a moving window of x monomers (amino acids or nucleotides) moving from start 
(N-terminus or 5') to end (C-terminus of 3 ! ), such that for an alignment that extends to p 
monomers (where p>x) there are p-x+1 such windows, each window has at least xy identical 
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aligned monomers, where: jc is selected from 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 150, 
200; y is selected from 0.50, 0.60, 0.70, 0.75, 0.80, 0.85, 0.90, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 
0.97, 0.98, 0.99; and if xy is not an integer then it is rounded up to the nearest integer. The 
preferred pairwise alignment algorithm is the Needleman-Wunsch global alignment algorithm 
5 [153], using default parameters (e.g. with Gap opening penalty = 10.0, and with Gap extension 

penalty = 0.5, using the EBLOSUM62 scoring matrix). This algorithm is conveniently 
implemented in the needle tool in the EMBOSS package [154]. 

The nucleic acids and polypeptides of the invention may additionally have further sequences to the 
N-teiTninus/5' and/or C-terminus/3 1 of these sequences (a) to (d). 

10 The practice of the present invention will employ, unless otherwise indicated, conventional methods 
of chemistry, biochemistry, molecular biology, immunology and pharmacology, within the skill of 
the art. Such techniques are explained fully in the literature. See, e.g., references 155-162, etc. 

BRIEF DESCRIPTION OF DRAWINGS 

There are no drawings. 

1 5 MODES FOR CARRYING OUT THE INVENTION 

Genome sequencing has been carried out on five strains of GBS from different serotypes: '18RS2r 
(type II; MLST type ST19), '515' (type la; MLST type ST23), 'CJB111' (type V; MLST type ST1), 
'COH1' (type III; MLST type ST17) and 'H36B' (type lb; MLST type ST6). Different numbers of 
coding sequences were identified in the five genomes: 



Strain 


18RS21 


515 


CJB111 


COH1 


H36B 


Coding seqs 


2151 


2249 


2167 


2410 


2393 



20 These 11370 coding sequences are given in the sequence listing together with their inferred 
translation products. Annotation of these polypeptide sequences is given in Table I. 

The sequence listing gives sequences in pairs, such that an odd-numbered sequence 'n' is a DNA 
coding sequence and the even-numbered sequence 'n+r is the corresponding amino acid sequence: 



Strain 


18RS21 


515 


CJB111 


COH1 


H36B 


SEQ ID NO s 


1-4302 


4303-8800 


8801-13134 


13135-17954 


17955-22740 



25 The polypeptides and their epitopes can be used as antigens e.g. in vaccines or diagnostic tests. 

Homologous coding sequences between strains are shown in Table II (listing SEQ ID numbers). For 
comparison, Table II also includes the 'gi' (Genlnfo Identifier) accession numbers for strains 
2603V/R (serotype V; MLST type ST106) [1] and KEM316 (serotype III; MLST type ST23) [3]. A 
single row in Table II includes all homologs and, where applicable, paralogs within a single strain. 

30 In contrast to Table II, coding sequences without homologs in any of the other six sequenced 
genomes (i.e. unique to one strain within the six strains) are listed in Table III. These are preferred 
coding sequences of the invention e.g. when strain-specificity is desired. Each of the seven 
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sequenced genomes contains between 13 and 61 sequences not present in any of the other strains. 
This variability exceeds that seen in the comparative genome hybridization analysis of reference 1 . 

Table IV lists coding sequences in the five new sequenced genomes that do not have any homologs 
in strains 2603V/R [1] or NEM316 [3]. These are preferred coding sequences of the invention 
5 e.g. when sequences not known in the prior art are desired. 

Table V lists 'core 5 GBS genes, namely those that are found in all seven sequenced genomes. These 
'universal' GBS coding sequences are preferred for use with the invention e.g. when strain- 
specificity is not desired, such as when designing a diagnostic test with high inter-strain 
cross-reactivity, or when preparing a composition which will elicit antibodies with high inter-strain 
10 cross-reactivity, or when screening for broad-range anti-GBS antibiotics. Table VI lists variable GBS 
genes, namely those that are found in at least two sequenced genomes, but not in all seven. The 
format of Tables V and VI follows that of Table II. 

The GBS "pan-genome" can thus be divided in three parts: a core-genome, strain-specific sequences, 
and "dispensable genes" shared only by some of the strains. The core genes describe the basic 
15 aspects of GBS biology and major phenotypic traits, whereas dispensable and strain-specific genes 
contribute to the observed genetic diversity of the species and might confer selective advantages, 
such as adaptation to different niches, antibiotic resistance, and increased invasive capabilities. 

The vast majority of genes making up the core genome belong to the groups of housekeeping 
functions, cell envelope, regulatory functions, and transport and binding proteins. However, about 

20 one third of the shared genes fall into the annotation class of hypothetical proteins and proteins of 
unknown function, thus suggesting that many aspects of basic GBS biology still need to be explored. 
Because of their 'core' nature, however, these sequences still have utility as they can be used in 
situations where inter-strain cross-reactivity is needed, without needing to know their true underlying 
biological function. Hypothetical genes and genes of unknown function are much more represented 

25 among the dispensable genes, probably due to the fact that more functions have been ascribed to 
better known (i.e. more frequently found) genes. This view is also supported by the strain-specific 
genes being predominantly of unknown function. Furthermore, genes associated with mobile and 
extrachromosomal elements are particularly abundant in this group, supporting the hypothesis that 
the majority of specific traits depend upon phenomena of lateral gene transfer. On the other hand, 

30 this class of genes is very poorly represented within the core genome, indicating that only a few of 
these rearrangements have remained stable during evolution of GBS. 

The core shared by all isolates (Table V) accounts for only about 80% of any single genome, with the 
remaining 20% being absent in at least one other strain (Table VI). Approximately 1800 coding 
sequences are shared by the sequenced GBS strains. The criteria for gene identity between genomes 
35 was set low so that coding sequences were considered shared even if they were quite divergent in 
sequence. The size of the core is thus likely to be an overestimated, but it substantially defines the 
basic characteristics of the GBS species. As further GBS genome sequences become available then 
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this "core" may decrease (by 'analogy,^ coding sequence would move from Table V to Table VI), 
but for the purposes of the present invention the "core" is the group given in Table V. Even using the 
sequences herein, the core decreases with the addition of each new genome, but extrapolation of the 
curve indicates that the core stabilizes at around 1800 coding sequences and will remain constant 
even as many more genomes are added. 

One mechanism by which bacteria can modulate their lifestyle and virulence in response to variable 
stimuli, stress conditions and adaptation to different niches is phase variation [163,164]. Such 
variation occurs by altering the length of short repeated DNA tracts within or immediately upstream 
of coding regions (contingency genes), thus causing frame-shifts and affecting protein synthesis. At 
least one important virulence-associated gene in GBS is regulated in this way [165], and so 
identification of further phase variable genes can identify new virulence factors. Virulence factors are 
particularly useful for vaccination, antibiotic targets, etc. Table VII shows such phase variable genes, 
and these are preferred polypeptides for use with the invention. 

It will be understood that the invention has been described by way of example only and modifications 
may be made whilst remaking within the scope and spirit of the invention. 
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TABLE II — 'Bomologs and paralogs 



2603vr 


18rs21 


515 


cjb111 


cohl 


h36b 


NEM316 


22535220 


1,3919 


8615 


12909 


14061 


[ 22383 


24413708 


22535221 


3 


8617 


12911 


14063 


22385, 22427 


24413709 


22535222 


5 


8619, 8703 


12913 


14065 


22387 


24413710 


22535223 


7 


8621 


12915 


14067 


22389 


24413711 


22535224 


9 


'i 8623 


12917 


14069 


22391 


24413712 


22535225 


11 


8625 


12919 


14071 


22393 ' 


24413713 


22533003 


! 13 


8627 


12921 


1 T\J / \J 




OOAQ/MQ7 


22533004 


15 


4303, 8629 


12923 




99^97 


91flCMA9fl 


22533005 


17 


4305, 8631 


12925 12927 


14077 


OOQQQ 


O^OQAAOO 

£OUv7TT^i7 


22533006,225 
35061 


19,3631 


4307. 8305 
8323, 8325, 
8327, 8329, 
8787 




14079 17271 

1 TV/ / V, I / Cm / 1 


99401 


^ouyTTOU 


22533007 


21 


4309 


9607 


14081 


22403 


23094431 


22533008 


23 


4311 


9609 


14083 


22405 


23094432 


22533009 


I 25 


4313 


9611 


14085 


22407 


23094433 


22533010 


27, 29 


4315 


9613 


14087 


22409, 22663 


23094434 


22533012 


31 


4317 


9615 


14089 


22413 


23094435 


22533013 


33 


4319 


9617 


14091 


22415 


910Q44'3fi 


22533014 


35 


4321 


9619 


14093 


22417 


23094437 


22533015 


37 


4323 


9621 


13135 140QR 


9941 Q 


^OU57ttOO 


22533016,225 
34732 


39 


4325 


9623 12021 


13137 14097 
16681 


91491 99491 

C 1 Tv70, CCr+C 1 


OIOQAAIQ 

couyTH-oy 


22533017 


41 


4327 


9625 


13139 13143 

• \J 1 Wj IV 1 TVj , 

14099 


22423 


99094440 

£UV/i7*TTTU 


22533018 


43 


4329 


9627 


13145, 13147 
13149, 14101 


22425 22623 


23094441 


22533020 


45, 47, 49, 51 


4331 


8801 


13181 


17963 


23094442 


22533021 


53 


4333, 4335 


8803 


13183 


17965 


23094443 


22533022 


55 


4337 


8805 


13185 


17967 


?3f)Q4444 


22533023 


57 


4339, 4341 


8807 


13187,13189 


17969 


23094445 


22533024 


59 


4343 


8809 


13191 


17971 


23094446 


22533025 


61 


4345 


8811 


13193 


1 7973 


J>' : !flC14447 


i 22533026 


63 


4347 


8813,8815 


13195 


17975 


23094448 


22533027 


65 


4349 


8817 


13197 


17977 180^ 
18035 


990QA44Q 
£OUi7TT<t57 


22533028 


67 


4351 


8819 


13199, 13245, 
13247, 17811, 
17821,17859 


17979 18037 
22555 


230944R0 


22533029 


69, 71,3965, 
4017, 4061 


4353 


8821 


13201, 17803, 
17891 


17981 


23094451 


22533030 


73, 75, 3963 


4355 


8823 


13203 


17983 


23094452 


22533032 


77 


4357, 4365 


8825 


13205 


17985 


23094453 


22533033 


79 




8827 


13207 


17987 


23094454 ! 


22533034 


81 


4361,4369, 
8707 


8829 


13209 


17989, 22505 


23094455 


22533035 


83, 85 


4371 


8831 


13211,13249, 
13251,13253 


17991 


23094456 


22533036 


87 


4373 


8833 


13213 


17993 


23094457 


22533037 


89 


4375 


8835 


13215,17895 


17995 


23094458 


22533038 


91 


4377,4413, 
4415,4417 ! 


8837 


13217,13255, 
17893 


17997 


23094459 
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- 


13161 


- 


- 


- 


- 


- 


- 


17619 


22279 


- 


- 


- 


- 


- 


17605 


22269 


- 


- 


- 


- 


12551 


- 


22467 


- 


- 


- 


- 


12149 


- 


21599,22457 


- 


- 


- 


- 


12147 


- 


22455 


- 


- 


- 


- 


12145 


- 


22453 


- 


- 


- 


- 


12139 


- 


22447 


- 


- 


- 


- 


12137 


- 


22445 


- 


- 


- 


- 


12133 


- 


22441 


- 


- 


- 


- 


12131 


- 


22439 


- 








12129 




22437 










12127 


_ 


22435 


- 


- 


- 


- 


12125 




22433 


- 


- 


- 


- 


12119 


- 


21589 




- 


- 


- 


12117 


- 


21587 


- 


- 


- 


- 


- 


16093, 17805, 
17929, 17939 


- 


- 










16085, 17881 






_ 






10831 




20109 


- 


- 


- 




10827 


- 


20105 


- 


- 


- 




10017, 10019 


- 


- 




- 


- 




10009, 12953 


- 


- 


- 


- 


- 




- 


- 


18237,22135, 
22643 


- 




- . 




9897 


- 


20367 




- 


- 




10799 


15375 


- 


- 


- 


• - 




12135 


- 


22443 


- 


- 


- 




12141 


- 


22449 




- 


- 




12143 


- 


22451 


- 


- 


- 




12481 


17143, 17947 


21903 




- 


- 


- 


12605 


17389 


22031 




- 


- 




12817 


17629 


22509 


- 








12929 




18873 










13045 


15817 














17213 


21951 












17615 


22275 












17617 


22277 
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cib111 


cohl 


h36b 


NEM316 










17621 


22281 
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TABLE III — Unique coding sequences 



2603V/R 


22533205, 22533568, 22533571 , 22533574, 22533576, 22533597, 22533706, 
22533776, 22533967, 22534356, 22534614, 22534650, 22534875, 22534876, 
22534877, 22534880, 22534882, 22534885, 22534886, 22534887, 22534889, 
22534891, 22534892, 22534894, 22534895, 22534898, 22534900, 22534901, 
22534903, 22534906, 22534907, 22534908, 22534911, 22534912, 22534915, 
22534916, 22534917, 22534919, 22534924, 22535035, 22535159, 22535160, 
22535163, 22535164, 22535165, 22535166, 22535167 


18RS21 


1749, 3623, 3981, 3985, 3989, 3991, 3999, 4057, 4059, 4081 , 4203, 4205, 4207 


515 


4359, 4367, 4803, 4805, 4807, 4809, 481 1, 4813, 5289, 5295, 5493, 5497, 5499, 
5505, 5507, 5511, 5513, 5515, 5563, 6887, 8267, 8269, 8647, 8683, 8685, 8695, 
8725, 8727, 8783, 8789, 8797 


CJB11 


9213,9237,9239.10005.10681.10809 10811 12795 12823 12827 12Q4Q 
12951,12967,12997,13043,13047,13087,13091,13093,13105 


COH1 


13153, 13157, 13159, 13241, 13243, 13263, 15187, 15201, 15227, 15819, 15821, 
15823, 15825, 15827, 15829, 16015, 16019, 16021, 16023, 16539, 16561, 16565, 
17319, 17321, 17693, 17705, 17707, 17719, 17753, 17785, 17797, 17819, 17897, 
17921, 17933 


H36B 


18691, 19065, 19067, 19071, 19073, 19075, 19085, 19087, 19089, 19091, 19093, 
19095, 19099, 19103, 19111, 19113, 19115, 19117, 19119, 19123, 19125, 19127, 
19129, 19131, 19133, 19135, 19139, 19141, 19143, 19145, 19149, 19165, 20099, 
20401 , 22529, 22531 , 22533, 22535, 22545, 22547, 22557, 22559, 22561 , 22565, 
22571, 22585, 22589, 22621, 22641, 22667, 22671, 22679, 22695, 22699, 22705, 
22715, 22717, 22721 , 22723, 22725, 22733 


NEM316 


23094662, 23094664, 23094667, 23094668, 23094669, 23094670, 23094794, 
23094796, 23094797, 23094798, 23094799, 23094802, 23094803, 23094806, 
23094808, 23094809, 23094810, 23094811, 23094812, 23094813, 23094814, 
23094815, 23094816, 23094818, 23094820, 23094821 , 23094822, 23094823, 
23094824, 23094825, 23094827, 23094828, 23094829, 23094830, 23094831 , 
23094832, 23094833, 23094835, 23095107, 23095109, 230951 10, 230951 1 1 , 
23095112, 23095115, 23095116, 23095119, 23095121, 23095122, 23095123, 
23095124, 23095125, 23095126, 23095127, 23095128, 23095129, 23095131, 
23095133, 23095134, 23095135, 23095136, 23095137, 23095138, 23095140, 
23095141, 23095142, 23095143, 23095144, 23095145, 23095146, 23095148, 
23095423, 23095425, 23095426, 23095427, 23095428, 23095429, 23095430, 
23095431, 23095433, 23095434, .23095435, 23095436, 23095437, 23095438, 
23095440, 23095442, 23095443, 23095444, 23095445, 23095446, 23095447, 
23095448, 23095449, 23095450, 23095452, 23095455, 23095456, 23095459, 
<£OUyb4bU, <;oUyo4b i , iZWyo<Vod, <ioUy54bo, 23095569, 23095570, 23095571 , 
23095572, 23095573, 23095574, 23095575, 23095576, 23095577, 23095578, 
23095579, 23095580, 23095581, 23095582, 23095583, 23095584, 23095585, 
,23095586, 23095587, 23095614, 23095615, 23095617, 23095623, 23095624, 
23095626, 23095627, 24412909, 24412910, 24412911, 24412912, 24412921, 
24412922, 24412923, 24412924, 24413555 
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18RS21 


283, 1123, 1127, 1129, 1273, 1275, 1731, 2321, 2367, 3839, 3841, 3843, 3845, 
3917, 4047, 4099, 4131, 4139, 4141 , 4159, 4181 , 4219, 4221 , 4223, 4225, 4227, 
4229, 4231 , 4233, 4235, 4237, 4239, 4241 , 4243, 4245, 4247, 4249, 4251 , 4253, 
4255, 4263, 4265, 4267, 4269, 4271, 4273, 4275, 4277, 4287 


515 


4459, 4479, 4601 , 4793, 4797, 4801 , 5285, 5291 , 5293, 5559, 5561 , 5565, 5567, 
5569, 5573, 5575, 5577, 5579, 5581 , 5585, 5587, 5589, 5591 , 5593, 5595, 5621 , 
5625, 5627, 5629, 5687, 5753, 6179, 6181 , 6209, 621 1 , 6821 , 6957, 8315, 8317, 
8319, 8321 , 8331 , 8333, 8335, 8665, 8667, 8673, 8791 


CJB11 


8901, 8921, 9043, 9897, 9911, 9915, 9917, 9919, 9981, 9985, 10009, 10017, 10019, 
10097, 10143, 10145, 10185, 10597, 10799, 10827, 10831, 11227, 12117, 12119, 
12125, 12127, 12129, 12131, 12133, 12135, 12137, 12139, 12141, 12143, 12145, 
12147, 12149, 12481, 12551, 12605, 12793, 12797, 12799, 12803, 12805, 12807, 
12809, 12813, 12815, 12817, 12819, 12821, 12831, 12929, 12953, 13045 


COH1 


13161, 13303, 13323, 13459, 14397, 14401, 14403, 14405, 14507, 14557, 14559, 
14619, 15211, 15375, 15785, 15803, 15817, 16085, 16093, 17143, 17213, 17281, 
17283, 17285, 17287, 17289, 17291, 17389, 17605, 17607, 17611, 17615, 17617, 
17619, 17621, 17623, 17625, 17627, 17629, 17631, 17641, 17643, 17645, 17647, 
17691, 17721, 17723, 17727, 17805, 17825, 17881, 17929, 17939, 17947 


H36B 


18085, 18105, 18237, 18873, 18893, 18895, 18931, 19255, 19259, 19261, 19263, 
19333, 19381, 19383, 19385, 19429, 19879, 20105, 20109, 20367, 20405, 20537, 
20595, 21587, 21589, 21599, 21903, 21951, 22031, 22135, 22269, 22271, 22275, 
22277, 22279, 22281 , 22283, 22285, 22287, 22293, 22299, 22301 , 22433, 22435, 
22437, 22439, 22441, 22443, 22445, 22447, 22449, 22451, 22453, 22455, 22457, 
22467, 22507, 22509, 22643 
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TABLE V — 6 Core 5 coding sequences 



2603vr 


! 18rs21 


! 515 


cjb111 


cohl 


h36b 


NEM316 


22535220 


1,3919 


8615 


I 12909 


14061 


22383 


24413708 


22535221 


3 


8617 


12911 


14063 


22385,22427 


24413709 


22535222 


5 


8619,8703 


12913 


14065 


22387 


24413710 


22535223 


7 


8621 


12915 


14067 


22389 


24413711 


22535224 


9 


8623 


12917 


14069 


22391 


24413712 


22535225 


11 


8625 


12919 


14071 


22393 


24413713 


22533003 


13 


8627 


12921 


14073 


22395 


23094427 


22533004 


i 15 


4303,8629 


! 12923 


14075 


22397 


23094428 


22533005 


17 


4305,8631 


12925,12927 


14077 


22399 


23094429 


22533007 


21 


4309 


9607 


14081 


22403 


23094431 


22533008 


23 


4311 


9609 


14083 


22405 


23094432 


22533009 


25 


4313 


9611 


14085 


22407 


23094433 j 


22533010 


27,29 


4315 


9613 


14087 


22409,22663 


23094434 


22533012 


31 


4317 


9615 


14089 


22413 


23094435 


22533013 


33 


4319 


9617 


14091 


22415 


23094436 


22533014 


35 


4321 


9619 


14093 


22417 


23094437 


22533015 


37 


4323 


9621 


13135 14095 


22419 


fc.vJvC7T ,t TOO 


22533016,225 
34732 


39 


4325 


9623 12021 


13137 14097 
16681 






22533017 


41 


4327 


9625 


13139 13143 

1 W 1 WW) \ \J \ IV) 

14099 


22423 


230Q4440 


22533018 


43 


4329 


9627 


13145,13147, 
13149,14101 


22425 22623 


23094441 


22533020 


45,47,49,51 


4331 


8801 


13181 


17963 


23094442 


22533021 


53 


4333,4335 


8803 


13183 


17965 


23094443 


22533022 


55 


4337 


8805 


13185 


17967 


23094444 


22533023 


57 


4339,4341 


8807 


13187,13189 


17969 


23094445 


22533024 


59 


4343 


8809 


13191 


17971 s 


23094446 


22533025 


61 


4345 


8811 


13193 


17973 


2*3094447 


22533026 


63 


4347 


8813,8815 


13195 


17975 


23094448 


22533027 


65 


4349 


8817 


13197 


1 7Q77 1 80^ 
18035 




22533028 


67 


4351 


8819 


13199,13245, 
13247,17811, 
17821,17859 


17979 18037 
22555 


23094450 


22533029 


69,71,3965,40 
17,4061 


4353 


8821 


13201,17803, 
17891 


17981 


23094451 


22533030 


73,75,3963 


4355 


8823 


13203 


17983 


23094452 


22533032 


77 


4357,4365 


8825 


13205 


17985 


23094453 


22533034 


81 


4361,4369,87 
07 


8829 


13209 


17989,22505 


23094455 j 


22533035 


83,85 


4371 


8831 


13211,13249, 
13251,13253 


17991 


23094456 


22533036 I 


87 


4373 


8833 


13213 


17993 


23094457 


22533037 


89 


4375 


8835 


13215,17895 


17995 


23094458 


22533038 


91 


4377,4413,44 
15,4417 


8837 


13217,13255, 
17893 


17997 


23094459 


22533039 


93 


4379 


8839 


13219,13257 


17999 


23094460 


22533040 


95 


4381,4419,44 


8841 


13221,13259 


18001 


23094461 
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2603vr I 18rs21 




rib111 

UlU III 


cohl 


h36b 


NEM316 






91 
d I 










22533041 


97 


AQ8Q 

4ooo 


8843 


139P3 13p«1 


18003 


23094462 


22533042 


99 


4O00 


884^ 


13PP0 1 


18005 


23094463 i 

imm W W" • W 


22533043 


101 


4387 


8847 


13227 


18007 


23094464 


22533044 


103 


4ooy 


oo4y 


1^990 


1800Q 
1 OUU£7 


230Q4465 


22533045 


105 


4oy 1 


OOO I 


139^1 




230Q4466 


22533046 


107 


4393 


8853 


13233 


18013 


23094467 


22533047 


109 


4395 


8855 


1 OOOK 1 7Q7K 
1 O£O0,l /o/o, 

17Q41 


1 Qfi1 C 1 QOQQ 

oui 0,1 ouoy, 
lftO/11 9Pfi31 


^ouyH-'+uo 


22533048 


111 


4oy/ 


QOC7 -i OfiOQ 

ooo/ , loU^o 


1 Q907 


180.17 

I OU I / 


P30Q44fiQ 1 


22533049 


113,115,117,3 
949 


a onn 


oooy 




i ou i y 


P30Q4470 


22533051 


121 


44U0 


OOOO 




1 3023 


23094473 


22533054 


125 


4409 


8867 


13269 


18027 


23094475 


22533055 


127 


4411 


oooy 


1 Q971 


i ou^y 




22533056 


129 


44<£o 


QQ71 


1 Q07Q 


I OUO I 


P30Q4477 


22533057 


131 


A /IOC AAO~7 

44^0, 44*:/ 


oo/o 


1*397*, 
I Oc.1 O 


18043 

I Ov*tw 


23094478 


22533058 


133 


A /10f\ A /lOI A A 

44^y,44o 1 ,44 

O.Q 

oo 


OO/O 


10977 1Q97Q 


1 ROA^ 1 80A7 

1 OU^rO, 1 OUt-/ , 

18049 18051 


P30Q4479 


22533059 


135 


AA1CL AA*X7 
*t 4 K5D,H- A rO/ 


8877 
OO / / 


13P81 


18053 


23094480 


22533060 


137 


44oy 


oo/y 


1775Q 


180^0 


23094481 


22533061 


139 


4441 


8881 


13283 


18057 


23094482 


22533063 


143 


A A AO 

444o 


OOOO 


I O^OO 


18061 
I OUO \ 


P30Q4484 


22533064 


145 


A A AC 

4445 


QQQ7 
OOO/ 


1 Oc.Q / , I ucoy 


ioncq 1QH71 
I OUOO, f OU/ i 


P30Q448o 


22533065 


147 


A A A~7 

4447 


QQQQ 

oooy 


i o^y i ! 


1 anKp; 1 807q 

I OUOO, I OU/ O 


P30Q448R 


22533066 


149 


A A A A 

4449 


QQQ1 

ooy i 


1 09QO 

lo^yo 


18087 180RQ 
I OUO / , I OU057, 

1807o 


93004487 


22533067 


151 


4451 


8893 


13295 


18077 


23094488 


22533068 


153 


A A CO 

4453 


obyo 




1807Q 

I OU / v7 


930Q448Q 


22533069 


155 


A A CC 

4455 


QQQ7 

ooy/ 


i o^yy 


18081 

1 OUO 1 


930Q44Q0 


22533070 


157 


! A A C~7 

4457 


ooyy 


I OOU I 


18083 

l OUOO 


930Q44Q1 


22533071 


159 


A AGH 
4401 


oyuo 


101 10.QAE. 
I O I DO, I OOUO, 


18087 
I OUO/ 


930Q44QP 


22533072 


161 


4463 


8905 


13307 


18089 


23094493 


22533073 


163 


A Add 

4400 


oyu/ 


1QonQ 

i oouy 


180Q1 

1 OU<7 1 


9^nQ4494 


22533074 


165 


AACT7 

44b/ 


oyuy 


1qq11 


180Q3 


P3094495 


22533075 


167 


AARQ 

44oy 


I QQ1 1 

oy 1 1 


13313 

I OO I o 


180QO 
1 OU530 


93094496 


22533076,225 
34790 


169,3205 


£900 

o^yy 


19187 
I £ I O/ 


1 OOU \ 


P1615 

Cm 1 U 1 \J 


23094497 244 
13378 


I 22533077 


171 


4471 


RQ13 


13315 


18097 


| 23094498 


22533078 


173 




0\7 1 vJ 


13317 


18099 


23094499 


22533080 


175 


AA7R 


8Q17 
o>? \ ( 


13319 


18101 


23094500 


22533081 


177 


AA77 


8Q1Q 

Ov? lo 


133P1 17737 


18103 


23094501 


22533082 


179 


4481 


I 8QP3 


13325 


18107 


23094502 


22533083 


lot 


448Q 
moo 


8QPR 


! 13327 


18109 


23094503 


22533084 


183 


4485 


8927 


13329 


18111 


23094504 


22533085 


185 


4487 


8929,8931 


13331 


18113 


23094505 


22533090 


193 


4495 


8939 


13333,13335 


18121,18123 


23094510 


22533091 


195 


4497 


8941 


13337 


18125 


23094511 


22533092,22? 


J 199,201,503 


4499,4875 


8943,8945,94 


13341,13345, 


18127,18135, 


23094512 
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2603vr 


18rs21 


515 


cjb111 


cohl 


h36b 


NEM316 


57 22533270 








I ODDO, lODOl , 
I 0000 


1 o4o/,l o4bl 




22533094 225 
33258 225332 

V* W fc— W w j Li U W w W Km 

60 






8Q47 9477 


1 '3 / 34 i 3 1 WA7 

13665 136R7 
17743 


I O I tLZfj I 0 I Of 


^ouy'f o 1 o 


22533095 


205 


4503 


8949 


13349 


18139 


23094514 


22533096 


207 


4505 


8951 


13339 13351 

1 WW WW, t www 1 


18141 


23094515 

w w w i w 1 w 


22533097 


209 


4507 


8953 


13353 


18143 


23094516 

*-ww w~w 1 \J 


22533098 


211 


4509 


8955 


13355 13357 

1 wwww, 1 www / 


18145 

1 w J i w 


23094517 

C.WWWTW 1 / 


22533099 


213 


4511 


8957 


13359 


18147 

1 w ) *T / 


23094518 


i 22533100 


215 


4513 


8959 


13361 


18149 


2309451 9 


22533101 


217 


4515 


8961 

www 1 


13363 

1 \J w VJ vJ 


18151 18171 


23094520 


225331 02 


219 


4517 


8963 


13365 

1 UU ww 


18153 

1 w i 


230945P1 


22533104 


221 


4519 


8965 


13367 


18155 

1 w 1 ww 


23094522 


22533105 


223 


4521 


8967 


13369 

1 WW WW 


18157 

■ V 1 W I 


23094523 


22533106 


225 


4523 


8969 


13371 

1 WW / 1 


18159 

1 w 1 ww 


23094524 


22533107 

(->t_W WW 1 VI 


227 


4525 


8971 


13373 

1 \JO / W 


18161 


2^094525 

c-OUC7*tO^O 


22533108 


229 


4527 


8973 


13375 13385 

1 WW / W j 1 Wwww 


18163 

I W 1 WW 


23094526 


22533116 

Kaah>W WW 1 1 V 


231 


4549 


8991 

www I 


13399 


18183 

I w 1 ww 


23094534 


22533117 


233 


4551 


8993 


13401,13403 


18185 


23094535 


22533118 


235 


455^ 


R995 


1 ^405 I 


1ft1ft7 
I O I o / 


oqrjQ/iciQfi 
^ouy^twOw 


22533119 


237 


4555 


8997 


11407 


1ft1 AQ 




22533120 


239 


4557 


ftQQQ 

0\/w'\7 


I Of i o 


i o i y o 


OOAQ/lcqO 


22533121 


241 


455Q 4561 45 
63 


9001 


11415 


I O l w*0 


oqnQ/iMQ 

^.ouyH-oOw' 


22533122 


243 


4565 

i w w w 


9003 


13417 13443 


18197 


2^094540 


22533123 


245 


4567 


9005 


1 141 9 1 9445 

1 0*T 1 w, 1 Ot-T-nJ 


1ft1QQ 
i o i v?y 


2^004541 

c.O\Ja f £ rO e + I 


22533124 


247 


4569 


9007 


11421 


I OcXf I 


9^0Q4542 
-COUyH-Of^ 


22533125 


249 251 253 4 
055 


4571 


9009 


13423 


I QcAJO 


OQf)Q454^ 


22533126 


255 


4573 


9011 


13425 


18205 


23094544 


1 22533128 

L_>£_ W W W 1 W 


257 


4577 


9015 

WW 1 W 


13429 


18209 


2^094545 


22533133 


259,261 


4585 


9023 


13437 


18217 


23094549 


22533134 


263,265,267 


4587 


9025 


13439 


18219 


23094550 


225^91 ^9 


9fiQ 271 

£Ow,£/ I 


*foy i 


yuoo 


1 04 0 I , i OHO ( 


\OeLcl 


£COUy4wO0 


22533140 


273 






11AE1 

1 OH-00 ' 
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- 


- 22534663 




7653 


11885 " 


T6525 


• -"21345" "" 


' '24413258 


22534664 


- 


7655 


11887 


16527 


21347 


24413259 


22534665 


- 


7657 


11889 


16529 


21349,21351 


24413260 


22534666 


- 


7659 


11891 


16531 


21353,21355 


24413261 


22534667 


- 


7661 


11893 


16533 


21357 


24413262 


22534676 


- 


7679 


11911 


16557 


- 


- 


22534693 


- 


7711 


11941 


16603 


21405 


24413284 


22534694 




7713 


11943 


16605 


21407,21409 


24413285 


22534695 




7715,7717 


11945 


16607 


21411,21413 


24413286 


22534696 




7719 


11947 


16609 


21415 


24413287 


22534697 




7721 


11949 


16611 


21417,22637, 
22639 


24413288 
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ll-'260SVr ILl 




»HfwLJ. 


cjb111 


cohl 


h36b 


NEM316 


22534698 




7723 


11951 


16613 


21419 


24413289 


22534706 


- 


7739 


11969 


16629 


21435 


24413297 


22534707 


- 


7741 


11971 


16631 


21437,21443 


I 24413298 


22534713 


- 


7753 


11981,11983, 
11985 


16647 


21453 


24413303 


22534714 


- 


7755 


11987 


16649 


21455 


24413304 


22534716 


- 


7759 


11991 


16653 


21459 


24413306 


22534718 


- 


7763 


11997 


! 16657 


21463 


24413308 


22534734 


- 


- 


12023 


16685 


21497 




22534735. 


- 


- 


12025 


16687 


21499 




i 22534780 


- 


! 7903 


12113 


16775 


21583 


24413369 


22534788 


- 


7917,7919 


12163 


16793,16797 


21611 


24413376 | 


22534789 


- 


! 6297 


I 12165 


16799 


21613 


24413377 


22534856 


- 


I 8027 


12299 


16953 


21737 


24413439 


22534883 


- 


- 


10039 


- 






22534884 


- 




10037,13021 




„ 


_ 


22534934 


- 


8077 


12353 


17003 


21793 




22534959 


- 


8125 


12397,12403 


17059 


21823 


24413485 


22534979 


- 


8167 


I 12441 


17103 


21861 


24413504 


22534996 


- 


8207 


12475 


17137 


21895 




22535024 


- 


8375 


12577 


17351 


22003 


m 


22535033 


- 


■ 


- 






24413551 


22535034 


- 


8297 


- 


- 






22535051 


- 


8287 


- 


17251 






22535052 


- 


8289 


- 


17253 






22535053 


- 


8291 


- 


17255 






22535054 


- 


8293 




17257 


m 




22535055 


- 


8295 


„ 


17259 


m 




22535072 


- 


8361 


m 


17317 


21985 




22535074 


- 


8365 


m 


17375 


21991 


24413569 


22535086 


- 


8387 


12589 


17367 


22015 


24413580 


22535127 | 


- 


8467 


12667 


17469 


22093 


24413617 


22535128 


- 


8469 


12669 


17471 


22095 


24413618 


22535129 


- 


8471 


12671 


17473 


22097 


24413619 


22535130 


- 


8473 


12673 


17475 


22099 


24413620 i 


22535131 


- 


8475 


12675 


17477 


22101,22675 


24413621 


22535132 


- 


8477 


12677 


17479 


22103 


24413622 


22535133 


- 


8479 


12679 


17481,17483 


22105 


24413623 j 


22535145 




6435 


12707 


17509717783' 


"22T87 




22535176 


- 


8529 


12771 


17567 


22245 


24413666 


22535203 | 


- 


8581,8583 


12877 


17685 


22341 


24413692 


22535204 


- 


8585 


12879 


14029 


22343 


24413693 


- 


- 


4401 


- 




_ 


23094471 






4403 








23094472 


- 


- 




9109 


- 


- 


23094588 






4671 


9111 






23094589 






4673 


9113 






23094590 










16569 




23094789,230 
95102,230954 
68 
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tt^eoWr' U 






cjb111 


cohl 


h36b 


NEM316 


_ 


_ 


8341 




17297 




23094897 






8339 


_ 


17295 


m 


23094898 






8337 


_ 


17293 


m 


23094899 






8263 


_ 




m 


23094904 


_ 


_ 






„ 


18883 


23094908 


_ 


_ 


„ 


9707 


m 




23094916 


_ 




m 


10003 


m 


■a 


23095036 


- 


- 


8313,8687 




17279,17873 




23095620 






6817 




15641 




23095712 


_ 


_ 


6819 




15643 


20403 


23095713 


- 


- 






15645 




23095714 






6977 


11245 




22485 22487 
22553 1 


2441 2894 


m 


_ 


8509 


12745,13029 


17547 


22225,22503 


24412898 






8505 


12741,13013, 
13015 


17543 ; 


20553,20555. 

to WW W,to V w UWj 

22221 


24412900 


m , 




8503 


12739,13011 


17541 


20557,22219 


24412901 


m 


— 




13009 




20559 


24412902 


_ 






13007 




20561 


24412903 


m 






13005 




20563,22581 


24412904 


_ 


_ 


m 






20653 


24412941 


_ 


m 


8659 


11303 


15903 


20739 


24412980 


m 


_ 


7797 








24413321 






7799,7801 


m 


m 


« j 


24413322 




_ 


7803 


at 


_ 




24413323 


_ 


_ 


7805 


_ 


- 




24413324 


- 


_ 


7807 


m 


m 


_ 


24413325 






7809 


_ 


- 




24413327 




_ 


7813 


_ 




m 


24413328 






7815 








24413329 


- 


- 


7951 


12223 




21665 


24413403 








12533 


17323 




24413546 


- 


- 


- 


12535 


17325 




24413547 








12541 


1 7333 


OIQfi-l 00A7K 


2441 1R4Q P44 
13557 244135 
64 




m 




12549 


17329 


21957,22461, 
22465 


24413561 




_ 


_ 


12553 


_ 


22469,22471 


24413562 


_ 




_ 




17335 


22477 


24413565 


m 




8641 


12731 


17533 


22211 


24413646 


m 


_ 


8639 


12733 


17535 


22213 


24413647 


m 


_ 


8499,8637 


12735 


17537 


22215 


24413648 


- 




8501 


12737 


17539 


22217 


24413649 


- 




8507 


12743,13017 


17545 


22223 


24413652 


- 


- 


8511 


12747 


17549 


22227 


24413654 






8513 


12749,12751 


17551 


22229 


24413655 






4459 


8901 


13303 


18085 








4479 


8921 


13323 


18105 








4793 




17727 
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ibii^i 11 " 1 '" 




cjb111 


coM 


h36b 


NEM316 


- 


- 


4797 


- 


17723 


_ 


_ 


! 


- 


4801 


- 


17721 


_ 


_ 


- 


- 


5285 


- 


- 


18931 




- 




5291,5293 


- 


17691 






- 


- 


5629 


9919 


14405 


19263 


_ 


- 


- 


5687 


10097 


14507 


19333 




- 


- 


5753 


10185 


14619 


19429 


_ 


- 




6181 


10597 


- 






- 




6209 


9981 








- 


- 


6211 


9985 


15211 


_ 


_ 


- 


- 


6821 


- 


- 


20405 


_ 


- 


- 


6957 


11227 


15785 


20537 




- 


- 


8315 


- 


17281 


- 


m 


- 


- 


8317 


- 


17283 


- 


_ 


- 


- 


8319 


- 


17285 






- 


- 


8321,8331 


- 


17287 


_ 


_ 


- 


- 


8333 


- 


17289 


_ 




- 


- 


8335 


- 


17291 


m 


_ 


- 


- 


8673,8791 


_ 


13161 


„ 


m 


- 


- 


- 




17619 


22279 


m 


- 


- 


- 


- 


17605 


22269 


_ 


- 


- 


- 


12551 


_ 


22467 


_ 


- 


- 


- 


12149 


_ 


21599,22457 




- 


- 


- 


12147 


- 


22455 


_ 


- 


- 


- 


12145 




22453 




- 


- 


- 


12139 


_ 


22447 




- 


- 


- 


12137 


_ 


22445 




- 


- 


- 


12133 




22441 


m 


- 




- 


12131 


- 


22439 


_ 


- 


- 


- 


12129 




22437 




- 


- 


- 


12127 




22435 


m 


- 


- 


- 


12125 




22433 


_ 


- 


- 


- 


' 12119 


- 


21589 


- 


- 


- 




12117 


- 


21587 


_ 


- 


- 


- 


10831 


- 


20109 




- 


- 


- 


10827 




20105 


_ 


- 


- 


- 


9897 


! 


20367 




- 


- 


- 


10799 


15375 


„ 


m 






— " — ' ■ ■ 


12135 


-• ■ '• • 


22443 


_ 


- 


- 


- 


12141 




22449 




- 


- 


- 


12143 


- 


22451 


m 


- 


- 


- 


12481 


17143,17947 


21903 


_ 


- 


- 


- 


12605 


17389 


22031 




- 


- 


- 


12817 


17629 


22509 




- 


- 


- 


12929 


- 


18873 


- 


- 


- 


- 


13045 


15817 


- 












17213 


21951 












17615 


22275 












17617 


22277 












17621 


22281 
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v aria Die gene v 


Location of repeat 


Position ot repeat w 


QPO TT) 7SJO- 1801 


5 ! end 


1 /I c 


QThO TT> T\TO- 1 QA1 


d ena 


2 


kjjj/v^ .1,1 -/ iNiwJ. yj i 


C1 pxryA 

o ena 


Ho 


OJIyls^ JUL/ J_NV_/, iy^fi 


2> ena 


13 


^ThO TO XTO- 91 41 


o ena 


11 


OJC/V^ JUL* iNvJ. ZoOj 


CI pvn/^ 

r> ena 


21 


Cl?n TD MO- 9£R7 
oJj/V^ JUL' lNVJ. Zoo / 


C» 

d ena 


O AC 

295 


9ThO TD XFO' ^1 Q1 

OJD V,/ JUL/ 1NVJ. Jl"l 


CI AtlJ J 

d ena 


lo 


WO TD "NTO- 94.47 


CI ~ n A 

d ena 


3 


SFO TD "NTO- ^77S 

O JC/V^ .1 LJ IN W , J> / / J 


CI ~r*A 

d ena 


JU1 


SEO ID NO* 6773 


5 f end 


76 
i \j 


SEQIDNO: 3723 


Middle 


1120 


SEQIDNO: 2313 


3' end 


3185 


SEQIDNO: 719 


Promoter 


40 


SEQIDNO: 4631 


Promoter 


103 


SEQ ID NO: 2373 


Promoter 


1 



c } Given for one strain only; Table II can be used 
to find any homologs in other strains. 

5 (2) relative to ATG 
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CLAIMS 

1 . An isolated polypeptide comprising an amino acid sequence which has at least 75% sequence identity 
to one or more of the even-numbered amino acid sequences selected from the group consisting of 
SEQIDNOS:2-22740. 

2. The polypeptide of claim 1, comprising one or more of the even-numbered amino acid sequences 
selected from the group consisting of SEQ ID NOS:2-22740. 

3. An isolated polypeptide comprising a fragment of at least 7 consecutive amino acids from one or 
more of the even-numbered amino acid sequences selected from the group consisting of SEQ ID 
NOS:2-22740. 

4. The polypeptide of claim 3, wherein the fragment comprises a T-cell or a B-cell epitope from an 
even-numbered amino acid sequence selected from the group consisting of SEQ ID NOS:2-22740. 

5. An antibody which binds to the polypeptide of any preceding claim. 

6. The antibody of claim 5 which is monoclonal. 

7. An isolated nucleic acid comprising a nucleotide sequence which has at least 75% sequence identity 
to one or more of the odd-numbered nucleotide sequences selected from the group consisting of SEQ 
IDNOS:l-22739. 

8. The nucleic acid of claim 7, comprising a nucleotide sequence which is an odd-numbered nucleotide 
sequence selected from the group consisting of SEQ ID NOS: 1-22739. 

9. An isolated nucleic acid which can hybridize to the nucleic acid of claim 8 under high stringency 
conditions. 

10. An isolated nucleic acid comprising a fragment of 10 or more consecutive nucleotides from one or 
more of the odd-numbered nucleotide sequences selected from the group consisting of SEQ ID 
NOS: 1-22739. 

11. An isolated nucleic acid encoding the polypeptide of any one of claims 1 to 4. 

12. A composition comprising: (a) polypeptide, antibody, and/or nucleic acid of any preceding claim; and 
(b) a pharmaceutically acceptable carrier. 

13. The composition of claim 12, further comprising a vaccine adjuvant. 

14. The nucleic acid, polypeptide, or antibody of any one of claims 1 to 1 1 for use as a medicament. 

15. A method of treating a. patient, comprising administering to the patient a therapeutically effective 
amount of the composition of claim 12. 

1 6. Use of the nucleic acid, polypeptide, or antibody of any one of claims 1 to 1 1 in the manufacture of a 
medicament for treating or preventing disease and/or infection caused by GBS. 
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